Skip to main content
This guide walks you through setting up simulations to test your AI agents in realistic, goal-oriented conversations.

1. Prerequisites

Before setting up simulations, ensure you have:

2. Configure Evaluators

Evaluators assess the entire simulated conversation after it completes. Netra provides 8 library evaluators in two categories: Quality (6 evaluators): Guideline Adherence, Conversation Completeness, Profile Utilization, Conversational Flow, Conversation Memory, Factual Accuracy Agentic (2 evaluators): Goal Fulfillment, Information Elicitation All evaluators use LLM-as-Judge with a default pass threshold of >= 0.6.
Start with Goal Fulfillment and Factual Accuracy as your core evaluators, then add more as needed. You can also create custom evaluators before setting up your dataset.

3. Create a Multi-Turn Dataset

Datasets define the scenarios you want to test—multi-turn conversations with specific goals.
1

Navigate to Datasets

Go to Evaluation → Datasets and click Create Dataset.
2

Configure Basics

  • Name: “Customer Refund Scenarios” - Type: Select Multi-turn - Data Source: Add manually - Click Next
3

Configure Scenario

Define your simulation scenario:
FieldValue
Scenario Goal”The customer wants to get a refund for a product that arrived damaged 15 days ago”
Max Turns5 (recommended for support scenarios)
User PersonaFrustrated 😤 (tests patience and de-escalation)
ProviderOpenAI
ModelGPT-4.1 (for realistic user simulation)
Click Next
4

Add User Data & Facts

Provide context and success criteria:Simulated User Data (JSON format):
{
  "order_number": "ORD-123456",
  "purchase_date": "2024-01-15",
  "product_name": "Wireless Headphones",
  "order_total": "$129.99"
}
Fact Checker (what the agent MUST communicate):
{
  "refund_processing_time": "5-7 business days",
  "refund_method": "Original payment method",
  "return_label_delivery": "Within 24 hours via email"
}
Click Next
5

Select Evaluators

Choose evaluators to score the conversation. Select from the library evaluators you reviewed in Step 2, or any custom evaluators you created:
  • Agentic: Goal Fulfillment (did the agent achieve the objective?)
  • Quality: Factual Accuracy (were facts communicated correctly?), Conversation Completeness
Click Next then Create Dataset

4. Run Your First Simulation

Once your dataset is configured, trigger simulations through your agent code:
1

Get Dataset ID

Open your dataset in the Netra dashboard and copy the Dataset ID from the top of the page.
2

Integrate with Your Agent

The simulation runs automatically when your agent code executes. Ensure your agent is instrumented with Netra tracing.
3

Monitor Progress

Navigate to Evaluation → Test Runs and filter by Multi turn type to see your simulation in progress.

5. Review Results

1

View Test Runs

Go to Evaluation → Test Runs and click on your completed simulation.
2

Check Summary Metrics

Review high-level performance: - Total scenarios run - Pass/fail rate - Average cost and latency
3

Examine Conversations

Click on any scenario to view: - Conversation tab: Full turn-by-turn dialogue - Evaluation Results tab: Turn-level and session-level scores - Scenario Details tab: Goal, user data, and facts
4

Debug with Traces

Click View Trace on any turn to see detailed execution traces for debugging.

What’s Next?

Simulation Overview

Learn more about the simulation framework and use cases

Create Advanced Scenarios

Build complex multi-turn scenarios with custom personas

Custom Evaluators

Create custom evaluators for your specific requirements

Common Patterns

Testing Customer Support

  • Personas: Test with Frustrated, Confused, and Neutral personas
  • Evaluators: Conversation Completeness, Factual Accuracy, Guideline Adherence
  • Max Turns: 4-6 for typical support scenarios

Testing Technical Assistants

  • Personas: Confused (needs extra clarification)
  • Evaluators: Conversational Flow, Conversation Completeness, Goal Fulfillment
  • Max Turns: 6-8 for complex troubleshooting

Guideline Compliance Testing

  • Scenarios: Create edge cases that challenge agent boundaries
  • Evaluators: Guideline Adherence, Goal Fulfillment
  • Personas: Frustrated (more likely to push boundaries)
Start with 3-5 scenarios covering your most critical use cases, then expand coverage as you gain confidence in the system.
Last modified on March 17, 2026