Skip to main content
This guide walks you through setting up simulations to test your AI agents in realistic, goal-oriented conversations.

1. Prerequisites

Before setting up simulations, ensure you have:

2. Create an Agent

Agents define what your AI system can do (abilities) and what it should avoid (constraints).
1

Navigate to Agents

Go to Evaluation → Agents and click Create Agent.
2

Name Your Agent

Provide a descriptive name that identifies what this agent does.Example: “Customer Support Agent” or “Technical Documentation Assistant”
3

Define Abilities

Describe what your agent can do:
You are a customer support agent for an e-commerce platform.

You can:
- Answer questions about orders, shipping, and returns
- Process refund requests for orders within 30 days
- Access customer order history and tracking information
- Escalate complex issues to human supervisors
4

Define Constraints (Optional)

Specify what your agent should NOT do:
You must NOT:
- Process refunds exceeding $200 without manager approval
- Share customer credit card numbers or payment data
- Make shipping promises you cannot verify
5

Create Agent

Click Create Agent to save your configuration.

3. Create a Multi-Turn Dataset

Datasets define the scenarios you want to test—multi-turn conversations with specific goals.
1

Navigate to Datasets

Go to Evaluation → Datasets and click Create Dataset.
2

Configure Basics

  • Name: “Customer Refund Scenarios” - Type: Select Multi-turn - Data Source: Add manually - Click Next
3

Configure Scenario

Define your simulation scenario:
FieldValue
AgentSelect the agent you created in Step 2
Scenario Goal”The customer wants to get a refund for a product that arrived damaged 15 days ago”
Max Turns5 (recommended for support scenarios)
User PersonaFrustrated 😤 (tests patience and de-escalation)
ProviderOpenAI
ModelGPT-4.1 (for realistic user simulation)
Click Next
4

Add User Data & Facts

Provide context and success criteria:Simulated User Data (JSON format):
{
  "order_number": "ORD-123456",
  "purchase_date": "2024-01-15",
  "product_name": "Wireless Headphones",
  "order_total": "$129.99"
}
Fact Checker (what the agent MUST communicate):
{
  "refund_processing_time": "5-7 business days",
  "refund_method": "Original payment method",
  "return_label_delivery": "Within 24 hours via email"
}
Click Next
5

Select Evaluators

Choose evaluators to score the conversation:Session-Level (entire conversation):
  • Goal Achievement: Did the agent help the user?
  • Fact Accuracy: Were all facts communicated correctly?
Click Next then Create Dataset

4. Configure Evaluators

Evaluators assess your simulations at two levels:

Session-Level Evaluators

Evaluate the entire conversation:
  • Goal Achievement: Did the scenario objective get met?
  • Fact Accuracy: Were critical facts communicated correctly?
  • Conversation Quality: How good was the overall interaction?
Start with Goal Achievement (session-level) as your core evaluators.

5. Run Your First Simulation

Once your dataset is configured, trigger simulations through your agent code:
1

Get Dataset ID

Open your dataset in the Netra dashboard and copy the Dataset ID from the top of the page.
2

Integrate with Your Agent

The simulation runs automatically when your agent code executes. Ensure your agent is instrumented with Netra tracing.
3

Monitor Progress

Navigate to Evaluation → Test Runs and filter by Multi turn type to see your simulation in progress.

6. Review Results

1

View Test Runs

Go to Evaluation → Test Runs and click on your completed simulation.
2

Check Summary Metrics

Review high-level performance: - Total scenarios run - Pass/fail rate - Average cost and latency
3

Examine Conversations

Click on any scenario to view: - Conversation tab: Full turn-by-turn dialogue - Evaluation Results tab: Turn-level and session-level scores - Scenario Details tab: Goal, user data, and facts
4

Debug with Traces

Click View Trace on any turn to see detailed execution traces for debugging.

What’s Next?

Common Patterns

Testing Customer Support

  • Personas: Test with Frustrated, Confused, and Neutral personas
  • Evaluators: Goal Achievement, Fact Accuracy, Guideline Adherence
  • Max Turns: 4-6 for typical support scenarios

Testing Technical Assistants

  • Personas: Confused (needs extra clarification)
  • Evaluators: Goal Achievement, Response Quality, Token Efficiency
  • Max Turns: 6-8 for complex troubleshooting

Constraint Compliance Testing

  • Scenarios: Create edge cases that challenge agent boundaries
  • Evaluators: Guideline Adherence to catch violations
  • Personas: Frustrated (more likely to push boundaries)
Start with 3-5 scenarios covering your most critical use cases, then expand coverage as you gain confidence in the system.
Last modified on February 11, 2026