Skip to main content
Simulation evaluators assess entire multi-turn conversations at the session level. After a simulated conversation completes, evaluators determine whether your agent achieved its goal, communicated facts correctly, and maintained quality throughout the interaction.

Why Simulation Evaluators Matter

Multi-turn conversations require different evaluation approaches than single-turn responses:
ChallengeHow Simulation Evaluators Help
Goal achievementVerify whether the conversation reached its objective
Fact accuracyTrack whether critical information was communicated correctly across multiple turns
Conversation qualityEnsure the conversation maintains logical flow, completeness, and consistency
Information gatheringAssess whether the agent effectively collected required information from the user

Evaluators Dashboard

Navigate to Evaluation → Evaluators from the left navigation panel. Switch to the Library tab and filter by Multi turn to see the simulation evaluators. Simulation Evaluators Library Netra organizes simulation evaluators into two categories: Quality and Agentic.

Library Evaluators

Netra provides 8 preconfigured library evaluators across two categories. All evaluators run at the session level, assessing the entire conversation after it completes.

Quality Evaluators

Quality evaluators assess how well your agent maintains conversation standards.
EvaluatorWhat It Measures
Guideline AdherenceWhether the assistant followed its given instructions throughout the conversation
Conversation CompletenessWhether all human intents were addressed during the conversation
Profile UtilizationWhether the assistant correctly used provided human profile information when relevant
Conversational FlowWhether the conversation flowed logically and the assistant maintained consistency
Conversation MemoryWhether the assistant remembered and correctly used information shared earlier
Factual AccuracyWhether the assistant’s claims were consistent with provided reference facts

Agentic Evaluators

Agentic evaluators assess goal-directed and information-gathering behavior.
EvaluatorWhat It Measures
Goal FulfillmentGoal achievement and progress toward the stated conversation objective
Information ElicitationHow effectively the agent gathered required information from the user

Evaluator Configuration

All 8 library evaluators share the same configuration:
SettingValue
TypeLLM as Judge
Eval ScopeSession (entire conversation)
OutputNumerical (0-1, normalized from 1-5 scale)
Default Pass Criteria>= 0.6
You can adjust the pass criteria threshold for any evaluator based on your requirements. A higher threshold enforces stricter quality standards.

Using Evaluators in Simulations

When configuring a multi-turn dataset, you select and configure evaluators in Step 4 of the dataset creation flow. You can choose any combination of Quality and Agentic evaluators based on what you want to measure.

Best Practices

Choosing Evaluators by Scenario Type

Scenario TypeRecommended Evaluators
Customer SupportConversation Completeness, Factual Accuracy, Guideline Adherence
Technical AssistanceConversation Completeness, Conversational Flow, Goal Fulfillment
Sales ConversationsProfile Utilization, Factual Accuracy, Information Elicitation
TroubleshootingConversation Completeness, Conversation Memory, Conversational Flow

Getting Started with Evaluators

  1. Start with Goal Fulfillment and Factual Accuracy — these cover the most critical aspects of any simulation
  2. Add Quality evaluators based on your use case — Conversation Completeness and Guideline Adherence are strong defaults
  3. Adjust pass criteria if the default threshold of 0.6 is too lenient or strict for your needs
  4. Monitor results across the first few test runs to ensure evaluators align with your expectations
Last modified on March 17, 2026