Evaluators for simulation assess multi-turn conversations at the session level. They determine whether your agent achieved the goal, communicated facts correctly, maintained quality, and respected constraints throughout the entire conversation.
Why Simulation Evaluators Matter
Multi-turn conversations require different evaluation approaches than single-turn responses:
| Challenge | How Simulation Evaluators Help |
|---|
| Goal achievement | Evaluators verify whether the conversation reached its objective |
| Fact accuracy | Track whether critical information was communicated correctly across multiple turns |
| Conversation quality | Evaluators ensure the conversation maintains quality standards |
| Constraint adherence | Verify the agent respected boundaries throughout the entire conversation |
Evaluators Dashboard
Navigate to Evaluation → Evaluators from the left navigation panel to access Netra’s library of preconfigured evaluators organized by category.
Using Evaluators in Simulations
When configuring multi-turn datasets, you’ll select and configure evaluators in Step 4.
Library Evaluators
Netra provides 8 pre-configured library evaluators that evaluate the entire conversation after it completes.
Available Library Evaluators:
- Guideline Adherence: Evaluates whether the assistant followed its given instructions and respected its constraints throughout the conversation
- Conversation Completeness: Assesses whether all human intents were addressed during the conversation
- Profile Utilization: Evaluates whether the assistant correctly used provided human profile information when relevant
- Conversational Flow: Measures whether the conversation flowed logically and the assistant maintained consistency
- Conversation Memory: Evaluates whether the assistant remembered and correctly used information shared during the conversation
- Factual Accuracy: Assesses whether the assistant’s claims were consistent with provided reference facts
- Goal Fulfillment: Evaluates goal achievement and progress toward stated conversation goal
- Information Elicitation: Evaluates information gathering and collection from the human
Evaluator Details
Example: Guideline Adherence (Library Evaluator)
This evaluator assesses whether the assistant followed its instructions and respected constraints.
Pre-configured Settings:
- Type: LLM as Judge
- Eval Type: Session
- Output: Numerical (0-1 normalized from 1-5 scale)
- Pass Criteria: >= 0.6
Example: Factual Accuracy (Library Evaluator)
This evaluator verifies whether the assistant’s claims were consistent with provided facts.
Pre-configured Settings:
- Type: LLM as Judge
- Eval Type: Session
- Output: Numerical (0-1 normalized from 1-5 scale)
- Pass Criteria: >= 0.6
To view all available library evaluators, navigate to
Evaluation → Evaluators and switch to the Library tab.
Best Practices
Choosing Evaluators for Simulations
| Scenario Type | Recommended Evaluators |
|---|
| Customer Support | Conversation Completeness, Factual Accuracy, Guideline Adherence |
| Technical Assistance | Conversation Completeness, Conversational Flow |
| Sales Conversations | Profile Utilization, Factual Accuracy, Guideline Adherence |
| Troubleshooting | Conversation Completeness, Conversation Memory, Conversational Flow |
Testing Evaluators
Before deploying evaluators to simulation datasets:
- Review the evaluator: Understand what each evaluator checks for
- Adjust pass criteria: Configure thresholds based on your requirements (default is 0.6)
- Test with sample conversations: Run a small test to see how the evaluators score your conversations
- Monitor results: Check the first few test runs to ensure the evaluators align with your expectations
Quick Reference
The 8 Library Evaluators:
- Guideline Adherence - Instruction following
- Conversation Completeness - All intents addressed
- Profile Utilization - Correct use of human info
- Conversational Flow - Logical coherence
- Conversation Memory - Knowledge retention
- Factual Accuracy - Fact consistency
- Goal Fulfillment - Goal achievement and progress
- Information Elicitation - Information gathering from human
All evaluators use:
- Type: LLM as Judge
- Output: Numerical (0-1 normalized from 1-5 scale)
- Default Pass Criteria: >= 0.6