Skip to main content
This guide walks you through setting up simulations to test your AI agents in realistic, goal-oriented conversations.

1. Prerequisites

Before setting up simulations, ensure you have:

2. Configure Evaluators

Evaluators assess the entire simulated conversation after it completes. Netra provides 8 library evaluators in two categories: Quality (6 evaluators): Guideline Adherence, Conversation Completeness, Profile Utilization, Conversational Flow, Conversation Memory, Factual Accuracy Agentic (2 evaluators): Goal Fulfillment, Information Elicitation All evaluators use LLM-as-Judge with a default pass threshold of >= 0.6.
Start with Goal Fulfillment and Factual Accuracy as your core evaluators, then add more as needed. You can also create custom evaluators before setting up your dataset.

3. Create a Multi-Turn Dataset

Datasets define the scenarios you want to test—multi-turn conversations with specific goals.
1

Navigate to Datasets

Go to Evaluation → Datasets and click Create Dataset.
2

Configure Basics

  • Name: “Customer Refund Scenarios” - Type: Select Multi-turn - Data Source: Add manually - Click Next
3

Configure Scenario

Define your simulation scenario:
FieldValue
Scenario Goal”The customer wants to get a refund for a product that arrived damaged 15 days ago”
Max Turns5 (recommended for support scenarios)
User PersonaFrustrated 😤 (tests patience and de-escalation)
ProviderOpenAI
ModelGPT-4.1 (for realistic user simulation)
Click Next
4

Add User Data & Facts

Provide context and success criteria:Simulated User Data (JSON format):
{
  "order_number": "ORD-123456",
  "purchase_date": "2024-01-15",
  "product_name": "Wireless Headphones",
  "order_total": "$129.99"
}
Fact Checker (what the agent MUST communicate):
{
  "refund_processing_time": "5-7 business days",
  "refund_method": "Original payment method",
  "return_label_delivery": "Within 24 hours via email"
}
Click Next
5

Select Evaluators

Choose evaluators to score the conversation. Select from the library evaluators you reviewed in Step 2, or any custom evaluators you created:
  • Agentic: Goal Fulfillment (did the agent achieve the objective?)
  • Quality: Factual Accuracy (were facts communicated correctly?), Conversation Completeness
Click Next then Create Dataset

4. Run Your First Simulation

Once your dataset is configured, trigger simulations through your agent code:
1

Get Dataset ID

Open your dataset in the Netra dashboard and copy the Dataset ID from the top of the page.
2

Integrate with Your Agent

The simulation runs automatically when your agent code executes. Ensure your agent is instrumented with Netra tracing.
3

Monitor Progress

Navigate to Evaluation → Test Runs and filter by Multi turn type to see your simulation in progress.

5. Review Results

1

View Test Runs

Go to Evaluation → Test Runs and click on your completed simulation.
2

Check Summary Metrics

Review high-level performance: - Total scenarios run - Pass/fail rate - Average cost and latency
3

Examine Conversations

Click on any scenario to view: - Conversation tab: Full turn-by-turn dialogue - Evaluation Results tab: Turn-level and session-level scores - Scenario Details tab: Goal, user data, and facts
4

Debug with Traces

Click View Trace on any turn to see detailed execution traces for debugging.

What’s Next?

Common Patterns

Testing Customer Support

  • Personas: Test with Frustrated, Confused, and Neutral personas
  • Evaluators: Conversation Completeness, Factual Accuracy, Guideline Adherence
  • Max Turns: 4-6 for typical support scenarios

Testing Technical Assistants

  • Personas: Confused (needs extra clarification)
  • Evaluators: Conversational Flow, Conversation Completeness, Goal Fulfillment
  • Max Turns: 6-8 for complex troubleshooting

Guideline Compliance Testing

  • Scenarios: Create edge cases that challenge agent boundaries
  • Evaluators: Guideline Adherence, Goal Fulfillment
  • Personas: Frustrated (more likely to push boundaries)
Start with 3-5 scenarios covering your most critical use cases, then expand coverage as you gain confidence in the system.
Last modified on February 16, 2026