1. Prerequisites
Before setting up simulations, ensure you have:- Netra SDK installed and initialized
- Your API key configured
2. Configure Evaluators
Evaluators assess the entire simulated conversation after it completes. Netra provides 8 library evaluators in two categories: Quality (6 evaluators): Guideline Adherence, Conversation Completeness, Profile Utilization, Conversational Flow, Conversation Memory, Factual Accuracy Agentic (2 evaluators): Goal Fulfillment, Information Elicitation All evaluators use LLM-as-Judge with a default pass threshold of >= 0.6.3. Create a Multi-Turn Dataset
Datasets define the scenarios you want to test—multi-turn conversations with specific goals.Configure Basics
- Name: “Customer Refund Scenarios” - Type: Select Multi-turn - Data Source: Add manually - Click Next
Configure Scenario
Define your simulation scenario:
Click Next
| Field | Value |
|---|---|
| Scenario Goal | ”The customer wants to get a refund for a product that arrived damaged 15 days ago” |
| Max Turns | 5 (recommended for support scenarios) |
| User Persona | Frustrated 😤 (tests patience and de-escalation) |
| Provider | OpenAI |
| Model | GPT-4.1 (for realistic user simulation) |
Add User Data & Facts
Provide context and success criteria:Simulated User Data (JSON format):Fact Checker (what the agent MUST communicate):Click Next
Select Evaluators
Choose evaluators to score the conversation. Select from the library evaluators you reviewed in Step 2, or any custom evaluators you created:
- Agentic: Goal Fulfillment (did the agent achieve the objective?)
- Quality: Factual Accuracy (were facts communicated correctly?), Conversation Completeness
4. Run Your First Simulation
Once your dataset is configured, trigger simulations through your agent code:Get Dataset ID
Open your dataset in the Netra dashboard and copy the Dataset ID from the top of the page.
Integrate with Your Agent
The simulation runs automatically when your agent code executes. Ensure your
agent is instrumented with Netra tracing.
5. Review Results
Check Summary Metrics
Review high-level performance: - Total scenarios run - Pass/fail rate -
Average cost and latency
Examine Conversations
Click on any scenario to view: - Conversation tab: Full turn-by-turn
dialogue - Evaluation Results tab: Turn-level and session-level scores -
Scenario Details tab: Goal, user data, and facts
What’s Next?
Simulation Overview
Learn more about the simulation framework and use cases
Create Advanced Scenarios
Build complex multi-turn scenarios with custom personas
Custom Evaluators
Create custom evaluators for your specific requirements
Common Patterns
Testing Customer Support
- Personas: Test with Frustrated, Confused, and Neutral personas
- Evaluators: Conversation Completeness, Factual Accuracy, Guideline Adherence
- Max Turns: 4-6 for typical support scenarios
Testing Technical Assistants
- Personas: Confused (needs extra clarification)
- Evaluators: Conversational Flow, Conversation Completeness, Goal Fulfillment
- Max Turns: 6-8 for complex troubleshooting
Guideline Compliance Testing
- Scenarios: Create edge cases that challenge agent boundaries
- Evaluators: Guideline Adherence, Goal Fulfillment
- Personas: Frustrated (more likely to push boundaries)
