Evaluators

Evaluators for simulation assess multi-turn conversations at the session level. They determine whether your agent achieved the goal, communicated facts correctly, maintained quality, and respected constraints throughout the entire conversation.

Why Simulation Evaluators Matter

Multi-turn conversations require different evaluation approaches than single-turn responses:

Challenge	How Simulation Evaluators Help
Goal achievement	Evaluators verify whether the conversation reached its objective
Fact accuracy	Track whether critical information was communicated correctly across multiple turns
Conversation quality	Evaluators ensure the conversation maintains quality standards
Constraint adherence	Verify the agent respected boundaries throughout the entire conversation

Evaluators Dashboard

Navigate to Evaluation → Evaluators from the left navigation panel to access Netra’s library of preconfigured evaluators organized by category.

Using Evaluators in Simulations

When configuring multi-turn datasets, you’ll select and configure evaluators in Step 4.

Library Evaluators

Netra provides 8 pre-configured library evaluators that evaluate the entire conversation after it completes. Available Library Evaluators:

Guideline Adherence: Evaluates whether the assistant followed its given instructions and respected its constraints throughout the conversation
Conversation Completeness: Assesses whether all human intents were addressed during the conversation
Profile Utilization: Evaluates whether the assistant correctly used provided human profile information when relevant
Conversational Flow: Measures whether the conversation flowed logically and the assistant maintained consistency
Conversation Memory: Evaluates whether the assistant remembered and correctly used information shared during the conversation
Factual Accuracy: Assesses whether the assistant’s claims were consistent with provided reference facts
Goal Fulfillment: Evaluates goal achievement and progress toward stated conversation goal
Information Elicitation: Evaluates information gathering and collection from the human

Evaluator Details

Example: Guideline Adherence (Library Evaluator) This evaluator assesses whether the assistant followed its instructions and respected constraints. Pre-configured Settings:

Type: LLM as Judge
Eval Type: Session
Output: Numerical (0-1 normalized from 1-5 scale)
Pass Criteria: >= 0.6

Example: Factual Accuracy (Library Evaluator) This evaluator verifies whether the assistant’s claims were consistent with provided facts. Pre-configured Settings:

Type: LLM as Judge
Eval Type: Session
Output: Numerical (0-1 normalized from 1-5 scale)
Pass Criteria: >= 0.6

To view all available library evaluators, navigate to Evaluation → Evaluators and switch to the Library tab.

Best Practices

Choosing Evaluators for Simulations

Scenario Type	Recommended Evaluators
Customer Support	Conversation Completeness, Factual Accuracy, Guideline Adherence
Technical Assistance	Conversation Completeness, Conversational Flow
Sales Conversations	Profile Utilization, Factual Accuracy, Guideline Adherence
Troubleshooting	Conversation Completeness, Conversation Memory, Conversational Flow

Testing Evaluators

Before deploying evaluators to simulation datasets:

Review the evaluator: Understand what each evaluator checks for
Adjust pass criteria: Configure thresholds based on your requirements (default is 0.6)
Test with sample conversations: Run a small test to see how the evaluators score your conversations
Monitor results: Check the first few test runs to ensure the evaluators align with your expectations

Quick Reference

The 8 Library Evaluators:

Guideline Adherence - Instruction following
Conversation Completeness - All intents addressed
Profile Utilization - Correct use of human info
Conversational Flow - Logical coherence
Conversation Memory - Knowledge retention
Factual Accuracy - Fact consistency
Goal Fulfillment - Goal achievement and progress
Information Elicitation - Information gathering from human

All evaluators use:

Type: LLM as Judge
Output: Numerical (0-1 normalized from 1-5 scale)
Default Pass Criteria: >= 0.6

Simulation Overview - Understand the full simulation framework
Datasets - Create scenarios that use evaluators
Test Runs - View evaluation results from simulations
Agents - Define agents to evaluate
Evaluation Evaluators - Standard evaluators documentation

Get Started

Observability

Evaluation

Simulation

Monitoring & Dashboard

Account settings

Why Simulation Evaluators Matter

Evaluators Dashboard

Using Evaluators in Simulations

Library Evaluators

Evaluator Details

Best Practices

Choosing Evaluators for Simulations

Testing Evaluators

Quick Reference

Get Started

Observability

Evaluation

Simulation

Monitoring & Dashboard

Account settings

​Why Simulation Evaluators Matter

​Evaluators Dashboard

​Using Evaluators in Simulations

​Library Evaluators

​Evaluator Details

​Best Practices

​Choosing Evaluators for Simulations

​Testing Evaluators

​Quick Reference

​Related

Why Simulation Evaluators Matter

Evaluators Dashboard

Using Evaluators in Simulations

Library Evaluators

Evaluator Details

Best Practices

Choosing Evaluators for Simulations

Testing Evaluators

Quick Reference

Related