Simulation evaluators assess entire multi-turn conversations at the session level. After a simulated conversation completes, evaluators determine whether your agent achieved its goal, communicated facts correctly, and maintained quality throughout the interaction.
Why Simulation Evaluators Matter
Multi-turn conversations require different evaluation approaches than single-turn responses:
| Challenge | How Simulation Evaluators Help |
|---|
| Goal achievement | Verify whether the conversation reached its objective |
| Fact accuracy | Track whether critical information was communicated correctly across multiple turns |
| Conversation quality | Ensure the conversation maintains logical flow, completeness, and consistency |
| Information gathering | Assess whether the agent effectively collected required information from the user |
Evaluators Dashboard
Navigate to Evaluation → Evaluators from the left navigation panel. Switch to the Library tab and filter by Multi turn to see the simulation evaluators.
Netra organizes simulation evaluators into two categories: Quality and Agentic.
Library Evaluators
Netra provides 8 preconfigured library evaluators across two categories. All evaluators run at the session level, assessing the entire conversation after it completes.
Quality Evaluators
Quality evaluators assess how well your agent maintains conversation standards.
| Evaluator | What It Measures |
|---|
| Guideline Adherence | Whether the assistant followed its given instructions throughout the conversation |
| Conversation Completeness | Whether all human intents were addressed during the conversation |
| Profile Utilization | Whether the assistant correctly used provided human profile information when relevant |
| Conversational Flow | Whether the conversation flowed logically and the assistant maintained consistency |
| Conversation Memory | Whether the assistant remembered and correctly used information shared earlier |
| Factual Accuracy | Whether the assistant’s claims were consistent with provided reference facts |
Agentic Evaluators
Agentic evaluators assess goal-directed and information-gathering behavior.
| Evaluator | What It Measures |
|---|
| Goal Fulfillment | Goal achievement and progress toward the stated conversation objective |
| Information Elicitation | How effectively the agent gathered required information from the user |
Evaluator Configuration
All 8 library evaluators share the same configuration:
| Setting | Value |
|---|
| Type | LLM as Judge |
| Eval Scope | Session (entire conversation) |
| Output | Numerical (0-1, normalized from 1-5 scale) |
| Default Pass Criteria | >= 0.6 |
You can adjust the pass criteria threshold for any evaluator based on your requirements. A higher threshold enforces stricter quality standards.
Using Evaluators in Simulations
When configuring a multi-turn dataset, you select and configure evaluators in Step 4 of the dataset creation flow. You can choose any combination of Quality and Agentic evaluators based on what you want to measure.
Best Practices
Choosing Evaluators by Scenario Type
| Scenario Type | Recommended Evaluators |
|---|
| Customer Support | Conversation Completeness, Factual Accuracy, Guideline Adherence |
| Technical Assistance | Conversation Completeness, Conversational Flow, Goal Fulfillment |
| Sales Conversations | Profile Utilization, Factual Accuracy, Information Elicitation |
| Troubleshooting | Conversation Completeness, Conversation Memory, Conversational Flow |
Getting Started with Evaluators
- Start with Goal Fulfillment and Factual Accuracy — these cover the most critical aspects of any simulation
- Add Quality evaluators based on your use case — Conversation Completeness and Guideline Adherence are strong defaults
- Adjust pass criteria if the default threshold of 0.6 is too lenient or strict for your needs
- Monitor results across the first few test runs to ensure evaluators align with your expectations