Quick Start: Evaluation

This guide walks you through setting up evaluations to measure your AI system’s accuracy, quality, and reliability.

1. Prerequisites

Before setting up evaluations, ensure you have:

Netra SDK installed and initialized
At least one traced LLM call in your dashboard
Your API key configured

2. Create a Dataset

Datasets are collections of test cases that define inputs and expected outputs for your AI system.

Option A: Create from Traces (Recommended)

Convert real-world interactions into test cases:

Navigate to Traces

Go to Observability → Traces and find a trace you want to use as a test case.

Add to Dataset

Click the Add to Dataset button on the trace.

Configure the Test Case

Enter a dataset name (e.g., “Customer Support QA”)
Add optional tags for organization
Review the input prompt
Provide the expected output
Click Next

Select Evaluators

Choose evaluators to score your AI’s performance (see Step 3).

Option B: Create Manually

Open Dataset Dashboard

Navigate to Evaluation → Datasets and click Create Dataset.

Configure Dataset

Enter a dataset name
Select Single Turn for request/response pairs
Choose Add manually

Add Test Cases

For each test case, provide:

Input: The prompt or question
Expected Output: The ideal response
Metadata (optional): Additional context

3. Configure Evaluators

Evaluators score your AI’s outputs against defined criteria. Netra offers two types:

LLM as Judge

Best for subjective quality assessment:

Answer Correctness: Does the response match the expected answer?
Relevance: Is the response relevant to the question?
Hallucination Detection: Does the response contain fabricated information?
Toxicity: Is the content safe and appropriate?

Code Evaluators

Best for deterministic checks:

JSON Validation: Verify JSON structure and schema
Regex Matching: Pattern-based validation
Custom Logic: Write JavaScript or Python for specific rules

Add Evaluators

When creating your dataset, click Next to reach the evaluator selection screen.

Select from Library

Browse pre-built evaluators in categories:

Quality
Performance
Agentic
Guardrails

Map Variables

Configure how evaluator variables map to your data:

Dataset field: Use values from your test cases
Agent response: Use the actual LLM output
Execution data: Use trace metadata

4. Run an Evaluation

Once your dataset is configured with evaluators:

Get Dataset ID

Open your dataset and copy the Dataset ID displayed at the top.

Trigger Evaluation

Run your AI system with the dataset inputs. Evaluations execute automatically when traces are created.

View Results

Navigate to Evaluation → Test Runs to see your evaluation results.

5. Analyze Test Run Results

Click on a test run to view detailed results:

Summary Metrics

Total Cost: Aggregate cost of all LLM calls
Average Latency: Response time across test cases
Pass/Fail Rate: Overall success rate

Per-Test-Case Results

Each test case shows:

Field	Description
Input	The prompt sent to the AI
Expected Output	Your defined ideal response
Task Output	The actual AI response
Status	Pass/Fail indicator
Evaluator Scores	Individual scores from each evaluator
View Trace	Link to the full execution trace

Troubleshooting

Issue	Solution
No test runs appearing	Ensure your dataset has evaluators configured and traces are being sent
Evaluator errors	Test your evaluator in the Playground before adding to datasets
Unexpected failures	Check variable mappings in evaluator configuration

Next Steps

Datasets

Learn advanced dataset management

Evaluators

Create custom evaluation logic

Test Runs

Deep dive into test run analysis

Evaluation Overview

Understand the full evaluation framework

Get Started

Observability

Evaluation

Analytics & Dashboard

Monitoring & Dashboard

Account settings

1. Prerequisites

2. Create a Dataset

Option A: Create from Traces (Recommended)

Option B: Create Manually

3. Configure Evaluators

LLM as Judge

Code Evaluators

4. Run an Evaluation

5. Analyze Test Run Results

Summary Metrics

Per-Test-Case Results

Troubleshooting

Next Steps

Datasets

Evaluators

Test Runs

Evaluation Overview

Get Started

Observability

Evaluation

Analytics & Dashboard

Monitoring & Dashboard

Account settings

​1. Prerequisites

​2. Create a Dataset

​Option A: Create from Traces (Recommended)

​Option B: Create Manually

​3. Configure Evaluators

​LLM as Judge

​Code Evaluators

​4. Run an Evaluation

​5. Analyze Test Run Results

​Summary Metrics

​Per-Test-Case Results

​Troubleshooting

​Next Steps

Datasets

Evaluators

Test Runs

Evaluation Overview

1. Prerequisites

2. Create a Dataset

Option A: Create from Traces (Recommended)

Option B: Create Manually

3. Configure Evaluators

LLM as Judge

Code Evaluators

4. Run an Evaluation

5. Analyze Test Run Results

Summary Metrics

Per-Test-Case Results

Troubleshooting

Next Steps