Skip to main content
Evaluators for simulation assess multi-turn conversations at the session level. They determine whether your agent achieved the goal, communicated facts correctly, maintained quality, and respected constraints throughout the entire conversation.

Why Simulation Evaluators Matter

Multi-turn conversations require different evaluation approaches than single-turn responses:
ChallengeHow Simulation Evaluators Help
Goal achievementEvaluators verify whether the conversation reached its objective
Fact accuracyTrack whether critical information was communicated correctly across multiple turns
Conversation qualityEvaluators ensure the conversation maintains quality standards
Constraint adherenceVerify the agent respected boundaries throughout the entire conversation

Evaluators Dashboard

Navigate to Evaluation → Evaluators from the left navigation panel to access Netra’s library of preconfigured evaluators organized by category. Evaluators Dashboard

Using Evaluators in Simulations

When configuring multi-turn datasets, you’ll select and configure evaluators in Step 4.

Library Evaluators

Netra provides 8 pre-configured library evaluators that evaluate the entire conversation after it completes. Available Library Evaluators:
  • Guideline Adherence: Evaluates whether the assistant followed its given instructions and respected its constraints throughout the conversation
  • Conversation Completeness: Assesses whether all human intents were addressed during the conversation
  • Profile Utilization: Evaluates whether the assistant correctly used provided human profile information when relevant
  • Conversational Flow: Measures whether the conversation flowed logically and the assistant maintained consistency
  • Conversation Memory: Evaluates whether the assistant remembered and correctly used information shared during the conversation
  • Factual Accuracy: Assesses whether the assistant’s claims were consistent with provided reference facts
  • Goal Fulfillment: Evaluates goal achievement and progress toward stated conversation goal
  • Information Elicitation: Evaluates information gathering and collection from the human

Evaluator Details

Example: Guideline Adherence (Library Evaluator) This evaluator assesses whether the assistant followed its instructions and respected constraints. Pre-configured Settings:
  • Type: LLM as Judge
  • Eval Type: Session
  • Output: Numerical (0-1 normalized from 1-5 scale)
  • Pass Criteria: >= 0.6
Example: Factual Accuracy (Library Evaluator) This evaluator verifies whether the assistant’s claims were consistent with provided facts. Pre-configured Settings:
  • Type: LLM as Judge
  • Eval Type: Session
  • Output: Numerical (0-1 normalized from 1-5 scale)
  • Pass Criteria: >= 0.6
To view all available library evaluators, navigate to Evaluation → Evaluators and switch to the Library tab.

Best Practices

Choosing Evaluators for Simulations

Scenario TypeRecommended Evaluators
Customer SupportConversation Completeness, Factual Accuracy, Guideline Adherence
Technical AssistanceConversation Completeness, Conversational Flow
Sales ConversationsProfile Utilization, Factual Accuracy, Guideline Adherence
TroubleshootingConversation Completeness, Conversation Memory, Conversational Flow

Testing Evaluators

Before deploying evaluators to simulation datasets:
  1. Review the evaluator: Understand what each evaluator checks for
  2. Adjust pass criteria: Configure thresholds based on your requirements (default is 0.6)
  3. Test with sample conversations: Run a small test to see how the evaluators score your conversations
  4. Monitor results: Check the first few test runs to ensure the evaluators align with your expectations

Quick Reference

The 8 Library Evaluators:
  1. Guideline Adherence - Instruction following
  2. Conversation Completeness - All intents addressed
  3. Profile Utilization - Correct use of human info
  4. Conversational Flow - Logical coherence
  5. Conversation Memory - Knowledge retention
  6. Factual Accuracy - Fact consistency
  7. Goal Fulfillment - Goal achievement and progress
  8. Information Elicitation - Information gathering from human
All evaluators use:
  • Type: LLM as Judge
  • Output: Numerical (0-1 normalized from 1-5 scale)
  • Default Pass Criteria: >= 0.6
Last modified on February 11, 2026