Datasets

Datasets for simulation define the scenarios you want to test—multi-turn conversations with specific goals, user personas, and success criteria. Unlike single-turn evaluation datasets, simulation datasets create dynamic, goal-oriented conversations that reflect real-world user interactions.

Why Simulation Datasets Matter

Simulation datasets transform simple Q&A testing into realistic conversation testing:

Benefit	Description
Goal-Oriented Testing	Test whether your agent achieves specific objectives, not just individual responses
Persona-Based Scenarios	Simulate different user types—frustrated, confused, friendly, or neutral
Multi-Turn Conversations	Test how your agent handles back-and-forth dialogue (1-10 turns)
Fact Verification	Ensure your agent communicates critical information correctly
Context Simulation	Provide user data and context for realistic scenario execution

Dataset Dashboard

Navigate to Evaluation → Datasets from the left navigation panel. Filter by Multi turn type to see simulation datasets.

You will see cards displaying the following fields, each providing key information about your datasets at a glance:

Column	Description
Dataset Name	Unique identifier for the simulation suite
Turn Type	MULTI for simulation datasets
Tags	Metadata labels for filtering and organization
Created At	Timestamp for version tracking
Actions	Quick access to edit or delete datasets

Creating a Multi-Turn Dataset

Click the Create Dataset button in the top right corner of the Datasets page.

Step 1: Basics

Configure Dataset Details

Field	Description
Name	A descriptive identifier for your simulation suite (e.g., “Customer Refund Scenarios”)
Tags	Labels for filtering (e.g., “customer-support”, “refunds”, “production”)
Type	Select Multi-turn for simulation scenarios
Data Source	Select Add manually to create scenarios one by one

Click Next

Proceed to scenario configuration.

Import from traces and CSV import for multi-turn datasets are coming soon.

Step 2: Scenario Configuration

This is where you define the simulation scenario.

Select Agent

Choose the agent you want to test. The agent’s abilities and constraints will guide its behavior during the simulation.

Define Scenario Goal

Describe what the simulated user is trying to achieve.Question: “What scenario are you testing?”Example:

The customer wants to get a refund for a product they purchased
15 days ago because it arrived damaged.

This becomes the goal that drives the simulated conversation.

Add Behavior Instructions (Optional)

Provide guidance on how the simulated user should behave during the conversation.Example:

Start politely, but become slightly impatient if the agent
asks for information already provided.

Set Max Turns

Choose the maximum number of conversation turns (1-10).

Lower (1-3): Quick interactions like single-question support
Medium (4-6): Standard support conversations
Higher (7-10): Complex, multi-step problem resolution

The simulation stops when either:

The goal is achieved
The max turns limit is reached
The scenario is abandoned or failed

Select User Persona

Choose how the simulated user behaves emotionally:

Persona	Icon	Description
Neutral	😐	Straightforward and factual, sticks to the point
Friendly	😊	Polite and cooperative, patient with the agent
Frustrated	😤	Impatient, wants quick resolution, may be curt
Confused	😕	Needs extra clarification, asks follow-up questions
Custom	✏️	Define your own persona behavior

The persona affects how the simulated user phrases questions and responds to the agent.

Select Provider & Model

Choose the LLM provider and model that will generate simulated user responses:

Provider: OpenAI, Anthropic, Google, etc.
Model: GPT-4.1, Claude, Gemini, etc.

Use consistent, capable models (GPT-4, Claude Sonnet) for realistic user simulation.

Step 3: User Data & Facts

This step defines the context and success criteria for the simulation.

Define Simulated User Data

Provide context data that the simulated user has access to. This information can be referenced during the conversation.Format Options: Table, JSON, or Plain TextExample (Table):

Key	Value
order_number	ORD-123456
purchase_date	2024-01-15
product_name	Wireless Headphones
order_total	$129.99
shipping_address	123 Main St, New York, NY

Example (JSON):

{
  "order_number": "ORD-123456",
  "purchase_date": "2024-01-15",
  "product_name": "Wireless Headphones",
  "order_total": "$129.99",
  "shipping_address": "123 Main St, New York, NY"
}

Example (Plain Text):

Order Number: ORD-123456
Purchase Date: January 15, 2024
Product: Wireless Headphones
Total: $129.99
Shipping: 123 Main St, New York, NY

The simulated user can naturally reference this data during conversation (e.g., “My order number is ORD-123456”).

Define Fact Checker

Specify facts that the agent MUST communicate correctly during the conversation.Format Options: Table, JSON, or Plain TextExample (Table):

Fact	Expected Value
refund_processing_time	5-7 business days
refund_method	Original payment method
return_label_delivery	Within 24 hours via email

Example (JSON):

{
  "refund_processing_time": "5-7 business days",
  "refund_method": "Original payment method",
  "return_label_delivery": "Within 24 hours via email"
}

These facts are used by evaluators to verify the agent provided correct information.

JSON Validation: When using JSON format, ensure there are no duplicate keys. The system validates JSON structure before allowing you to proceed.

Step 4: Evaluator Selection

Select Evaluators

Choose evaluators from the library or your saved configurations.For simulations, you can use:

Turn-level evaluators: Assess individual conversation turns

Session-level evaluators: Assess the entire conversation

Recommended Evaluators:

Goal Achievement (session-level)
Fact Accuracy (session-level)
Response Quality (turn-level)
Constraint Adherence (turn-level)

Configure Variable Mappings

Map evaluator variables to:

Scenario fields: Goal, persona, user data
Agent response: What the agent said in each turn
Conversation metadata: Turn index, conversation history
Execution data: Latency, tokens, model

Each evaluator may require different variable mappings.

Step 5: Advanced Configuration (Optional)

Additional evaluator setup and fine-tuning options.

Review Configuration

Review all evaluator configurations and mappings.

Create Dataset

Click Create Dataset to finalize. Your simulation dataset is now ready to run.

Running a Simulation

Once your dataset is configured, you can run simulations:

Get Dataset ID

Open your dataset and copy the Dataset ID displayed at the top of the page.

Trigger Simulation

Use the Dataset ID in your simulation code. The simulation runs automatically through the Netra SDK.

View Results

Monitor progress and results in Test Runs.

Simulations execute automatically when the associated code is triggered. You don’t need to manually start each run—just ensure your agent code is integrated with Netra.

Best Practices

Crafting Effective Scenarios

Be specific: “Get a refund for a damaged product” is better than “Ask about returns”
Include context: Provide enough detail for realistic simulation (order details, timeline, issue description)
Include edge cases: Create scenarios that challenge your agent’s boundaries
Vary complexity: Mix simple (2-3 turns) and complex (7-10 turns) scenarios

Choosing User Personas

Neutral: Best for baseline performance testing
Friendly: Tests whether your agent maintains professionalism even when not challenged
Frustrated: Critical for customer support agents—tests patience and de-escalation
Confused: Tests clarity and explanation quality
Custom: Use for industry-specific personas (technical users, non-native speakers, etc.)

Defining User Data

Provide realistic data: Use representative order numbers, dates, and values
Include edge cases: Test with missing fields, unusual values, or conflicting data
Keep it relevant: Only include data that matters for the scenario
Use consistent formats: Standardize date formats, currency, and naming

Setting Fact Checkers

Focus on critical facts: What MUST the agent communicate correctly?
Be precise: “5-7 business days” is better than “about a week”
Test compliance: Include regulatory or policy-critical information
Verify, don’t duplicate: Don’t repeat information already in user data

Simulation Overview - Understand the full simulation framework
Agents - Define agents to test in simulations
Evaluators - Configure scoring logic for simulations
Test Runs - View simulation results and conversation transcripts
Traces - Debug simulation turns with execution traces

Get Started

Observability

Evaluation

Simulation

Monitoring & Dashboard

Account settings

Why Simulation Datasets Matter

Dataset Dashboard

Creating a Multi-Turn Dataset

Step 1: Basics

Step 2: Scenario Configuration

Step 3: User Data & Facts

Step 4: Evaluator Selection

Step 5: Advanced Configuration (Optional)

Running a Simulation

Best Practices

Crafting Effective Scenarios

Choosing User Personas

Defining User Data

Setting Fact Checkers

Get Started

Observability

Evaluation

Simulation

Monitoring & Dashboard

Account settings

​Why Simulation Datasets Matter

​Dataset Dashboard

​Creating a Multi-Turn Dataset

​Step 1: Basics

​Step 2: Scenario Configuration

​Step 3: User Data & Facts

​Step 4: Evaluator Selection

​Step 5: Advanced Configuration (Optional)

​Running a Simulation

​Best Practices

​Crafting Effective Scenarios

​Choosing User Personas

​Defining User Data

​Setting Fact Checkers

​Related

Why Simulation Datasets Matter

Dataset Dashboard

Creating a Multi-Turn Dataset

Step 1: Basics

Step 2: Scenario Configuration

Step 3: User Data & Facts

Step 4: Evaluator Selection

Step 5: Advanced Configuration (Optional)

Running a Simulation

Best Practices

Crafting Effective Scenarios

Choosing User Personas

Defining User Data

Setting Fact Checkers

Related