Skip to main content
Datasets for simulation define the scenarios you want to test—multi-turn conversations with specific goals, user personas, and success criteria. Unlike single-turn evaluation datasets, simulation datasets create dynamic, goal-oriented conversations that reflect real-world user interactions.

Why Simulation Datasets Matter

Simulation datasets transform simple Q&A testing into realistic conversation testing:
BenefitDescription
Goal-Oriented TestingTest whether your agent achieves specific objectives, not just individual responses
Persona-Based ScenariosSimulate different user types—frustrated, confused, friendly, or neutral
Multi-Turn ConversationsTest how your agent handles back-and-forth dialogue (1-10 turns)
Fact VerificationEnsure your agent communicates critical information correctly
Context SimulationProvide user data and context for realistic scenario execution

Dataset Dashboard

Navigate to Evaluation → Datasets from the left navigation panel. Filter by Multi turn type to see simulation datasets. Simulation Datasets Dashboard You will see cards displaying the following fields, each providing key information about your datasets at a glance:
ColumnDescription
Dataset NameUnique identifier for the simulation suite
Turn TypeMULTI for simulation datasets
TagsMetadata labels for filtering and organization
Created AtTimestamp for version tracking
ActionsQuick access to edit or delete datasets

Creating a Multi-Turn Dataset

Click the Create Dataset button in the top right corner of the Datasets page.

Step 1: Basics

Dataset Basics Configuration
1

Configure Dataset Details

FieldDescription
NameA descriptive identifier for your simulation suite (e.g., “Customer Refund Scenarios”)
TagsLabels for filtering (e.g., “customer-support”, “refunds”, “production”)
TypeSelect Multi-turn for simulation scenarios
Data SourceSelect Add manually to create scenarios one by one
2

Click Next

Proceed to scenario configuration.
Import from traces and CSV import for multi-turn datasets are coming soon.

Step 2: Scenario Configuration

This is where you define the simulation scenario. Scenario Configuration
1

Select Agent

Choose the agent you want to test. The agent’s abilities and constraints will guide its behavior during the simulation.
2

Define Scenario Goal

Describe what the simulated user is trying to achieve.Question: “What scenario are you testing?”Example:
The customer wants to get a refund for a product they purchased
15 days ago because it arrived damaged.
This becomes the goal that drives the simulated conversation.
3

Add Behavior Instructions (Optional)

Provide guidance on how the simulated user should behave during the conversation.Example:
Start politely, but become slightly impatient if the agent
asks for information already provided.
4

Set Max Turns

Choose the maximum number of conversation turns (1-10).
  • Lower (1-3): Quick interactions like single-question support
  • Medium (4-6): Standard support conversations
  • Higher (7-10): Complex, multi-step problem resolution
The simulation stops when either:
  • The goal is achieved
  • The max turns limit is reached
  • The scenario is abandoned or failed
5

Select User Persona

Choose how the simulated user behaves emotionally:
PersonaIconDescription
Neutral😐Straightforward and factual, sticks to the point
Friendly😊Polite and cooperative, patient with the agent
Frustrated😤Impatient, wants quick resolution, may be curt
Confused😕Needs extra clarification, asks follow-up questions
Custom✏️Define your own persona behavior
The persona affects how the simulated user phrases questions and responds to the agent.
6

Select Provider & Model

Choose the LLM provider and model that will generate simulated user responses:
  • Provider: OpenAI, Anthropic, Google, etc.
  • Model: GPT-4.1, Claude, Gemini, etc.
Use consistent, capable models (GPT-4, Claude Sonnet) for realistic user simulation.

Step 3: User Data & Facts

This step defines the context and success criteria for the simulation. User Data and Facts Configuration
1

Define Simulated User Data

Provide context data that the simulated user has access to. This information can be referenced during the conversation.Format Options: Table, JSON, or Plain TextExample (Table):
KeyValue
order_numberORD-123456
purchase_date2024-01-15
product_nameWireless Headphones
order_total$129.99
shipping_address123 Main St, New York, NY
Example (JSON):
{
  "order_number": "ORD-123456",
  "purchase_date": "2024-01-15",
  "product_name": "Wireless Headphones",
  "order_total": "$129.99",
  "shipping_address": "123 Main St, New York, NY"
}
Example (Plain Text):
Order Number: ORD-123456
Purchase Date: January 15, 2024
Product: Wireless Headphones
Total: $129.99
Shipping: 123 Main St, New York, NY
The simulated user can naturally reference this data during conversation (e.g., “My order number is ORD-123456”).
2

Define Fact Checker

Specify facts that the agent MUST communicate correctly during the conversation.Format Options: Table, JSON, or Plain TextExample (Table):
FactExpected Value
refund_processing_time5-7 business days
refund_methodOriginal payment method
return_label_deliveryWithin 24 hours via email
Example (JSON):
{
  "refund_processing_time": "5-7 business days",
  "refund_method": "Original payment method",
  "return_label_delivery": "Within 24 hours via email"
}
These facts are used by evaluators to verify the agent provided correct information.
JSON Validation: When using JSON format, ensure there are no duplicate keys. The system validates JSON structure before allowing you to proceed.

Step 4: Evaluator Selection

Evaluator Selection
1

Select Evaluators

Choose evaluators from the library or your saved configurations.For simulations, you can use:
  • Turn-level evaluators: Assess individual conversation turns
  • Session-level evaluators: Assess the entire conversation
Recommended Evaluators:
  • Goal Achievement (session-level)
  • Fact Accuracy (session-level)
  • Response Quality (turn-level)
  • Constraint Adherence (turn-level)
2

Configure Variable Mappings

Map evaluator variables to:
  • Scenario fields: Goal, persona, user data
  • Agent response: What the agent said in each turn
  • Conversation metadata: Turn index, conversation history
  • Execution data: Latency, tokens, model
Each evaluator may require different variable mappings.

Step 5: Advanced Configuration (Optional)

Additional evaluator setup and fine-tuning options.
1

Review Configuration

Review all evaluator configurations and mappings.
2

Create Dataset

Click Create Dataset to finalize. Your simulation dataset is now ready to run.

Running a Simulation

Once your dataset is configured, you can run simulations:
1

Get Dataset ID

Open your dataset and copy the Dataset ID displayed at the top of the page.Dataset ID
2

Trigger Simulation

Use the Dataset ID in your simulation code. The simulation runs automatically through the Netra SDK.
3

View Results

Monitor progress and results in Test Runs.
Simulations execute automatically when the associated code is triggered. You don’t need to manually start each run—just ensure your agent code is integrated with Netra.

Best Practices

Crafting Effective Scenarios

  • Be specific: “Get a refund for a damaged product” is better than “Ask about returns”
  • Include context: Provide enough detail for realistic simulation (order details, timeline, issue description)
  • Include edge cases: Create scenarios that challenge your agent’s boundaries
  • Vary complexity: Mix simple (2-3 turns) and complex (7-10 turns) scenarios

Choosing User Personas

  • Neutral: Best for baseline performance testing
  • Friendly: Tests whether your agent maintains professionalism even when not challenged
  • Frustrated: Critical for customer support agents—tests patience and de-escalation
  • Confused: Tests clarity and explanation quality
  • Custom: Use for industry-specific personas (technical users, non-native speakers, etc.)

Defining User Data

  • Provide realistic data: Use representative order numbers, dates, and values
  • Include edge cases: Test with missing fields, unusual values, or conflicting data
  • Keep it relevant: Only include data that matters for the scenario
  • Use consistent formats: Standardize date formats, currency, and naming

Setting Fact Checkers

  • Focus on critical facts: What MUST the agent communicate correctly?
  • Be precise: “5-7 business days” is better than “about a week”
  • Test compliance: Include regulatory or policy-critical information
  • Verify, don’t duplicate: Don’t repeat information already in user data
  • Simulation Overview - Understand the full simulation framework
  • Agents - Define agents to test in simulations
  • Evaluators - Configure scoring logic for simulations
  • Test Runs - View simulation results and conversation transcripts
  • Traces - Debug simulation turns with execution traces
Last modified on February 11, 2026