The Netra SDK exposes an evaluation client that lets you:
Manage datasets - Create datasets and add test items
Run test suites - Execute tasks against datasets with automatic tracing
Apply evaluators - Score outputs using built-in or custom evaluators
This page shows how to use netra.evaluation to manage datasets, run test suites, and programmatically evaluate your AI applications.
Getting Started
The evaluation client is available on the main Netra entry point after initialization.
import { Netra } from "netra-sdk-js" ;
const client = new Netra ({
apiKey: "your-api-key" ,
});
// Access the evaluation client
await client . evaluation . createDataset ( ... );
await client . evaluation . addDatasetItem ( ... );
await client . evaluation . getDataset ( ... );
await client . evaluation . runTestSuite ( ... );
createDataset
Create an empty dataset that can hold test items for evaluation runs.
import { Netra } from "netra-sdk-js" ;
const client = new Netra ({ apiKey: "..." });
const result = await client . evaluation . createDataset (
"Customer Support QA" , // name
[ "support" , "qa" , "v1" ] // tags (optional)
);
if ( result ) {
console . log ( `Dataset created: ${ result . id } ` );
console . log ( `Name: ${ result . name } ` );
console . log ( `Tags: ${ result . tags } ` );
}
Parameters
Parameter Type Description namestringName of the dataset (required) tagsstring[]?Optional tags for categorization
Response: CreateDatasetResponse
Field Type Description idstringUnique dataset identifier namestringDataset name tagsstring[]Associated tags projectIdstringProject identifier organizationIdstringOrganization identifier createdBystringCreator identifier updatedBystringLast updater identifier createdAtstringCreation timestamp updatedAtstringLast update timestamp deletedAtstring | nullDeletion timestamp (if soft-deleted)
addDatasetItem
Add a single test item to an existing dataset.
import { Netra } from "netra-sdk-js" ;
const client = new Netra ({ apiKey: "..." });
const result = await client . evaluation . addDatasetItem (
"dataset-123" , // datasetId
{ // item
input: "What is the return policy for electronics?" ,
expectedOutput: "Electronics can be returned within 30 days with original packaging." ,
tags: [ "policy" , "returns" ],
metadata: { category: "electronics" , priority: "high" },
}
);
if ( result ) {
console . log ( `Item added: ${ result . id } ` );
console . log ( `Input: ${ result . input } ` );
}
Parameters
Parameter Type Description datasetIdstringID of the target dataset itemDatasetEntryThe test item to add
DatasetEntry
Field Type Description inputanyThe input to pass to your task (required) expectedOutputany?Expected output for comparison tagsstring[]?Optional tags for the item metadataRecord<string, any>?Optional metadata for evaluators
Response: AddDatasetItemResponse
Field Type Description idstringUnique item identifier datasetIdstringParent dataset ID projectIdstringProject identifier organizationIdstringOrganization identifier sourcestringSource of the item sourceIdstring?Source reference ID inputanyThe input value expectedOutputanyThe expected output isActivebooleanWhether the item is active tagsstring[]Associated tags metadataRecord<string, any>?Item metadata createdBystringCreator identifier updatedBystringLast updater identifier createdAtstringCreation timestamp updatedAtstringLast update timestamp deletedAtstring?Deletion timestamp (if soft-deleted)
getDataset
Retrieve a dataset and all its items by ID.
import { Netra } from "netra-sdk-js" ;
const client = new Netra ({ apiKey: "..." });
const dataset = await client . evaluation . getDataset ( "dataset-123" );
if ( dataset ) {
console . log ( `Total items: ${ dataset . items . length } ` );
for ( const item of dataset . items ) {
console . log ( `ID: ${ item . id } ` );
console . log ( `Input: ${ item . input } ` );
console . log ( `Expected: ${ item . expectedOutput } ` );
console . log ( "---" );
}
}
Parameters
Parameter Type Description datasetIdstringID of the dataset to retrieve
Response: GetDatasetItemsResponse
Field Type Description itemsDatasetRecord[]List of dataset items
DatasetRecord
Field Type Description idstringItem identifier datasetIdstringParent dataset ID inputanyThe input value expectedOutputanyThe expected output
runTestSuite
Execute a test suite against a dataset, running your task function on each item and optionally applying evaluators.
import { Netra } from "netra-sdk-js" ;
import OpenAI from "openai" ;
const client = new Netra ({ apiKey: "..." });
const openai = new OpenAI ();
// Task function that processes each dataset item
async function myTask ( inputData : any ) : Promise < string > {
const response = await openai . chat . completions . create ({
model: "gpt-4o-mini" ,
messages: [
{ role: "system" , content: "You are a helpful assistant." },
{ role: "user" , content: inputData },
],
});
return response . choices [ 0 ]. message . content || "" ;
}
// Get dataset
const dataset = await client . evaluation . getDataset ( "dataset-123" );
if ( dataset ) {
// Run test suite
const result = await client . evaluation . runTestSuite (
"GPT-4o Mini Evaluation" , // name
dataset , // data
myTask , // task
[ "correctness" , "relevance" ], // evaluators (optional)
10 // maxConcurrency
);
if ( result ) {
console . log ( `Run ID: ${ result . runId } ` );
console . log ( `Items processed: ${ result . items . length } ` );
}
}
Parameters
Parameter Type Description namestringName for this test run (required) dataDatasetDataset from getDataset() taskTaskFunctionFunction that takes input and returns output evaluatorsany[]?Optional evaluator IDs or configs maxConcurrencynumberMax parallel task executions (default: 50)
Response
Field Type Description runIdstringUnique run identifier itemsobject[]Results for each processed item
Item Result
Field Type Description indexnumberItem index in dataset statusstring"completed" or "failed"traceIdstringTrace ID for observability spanIdstringSpan ID for the task execution testRunItemIdstringBackend item identifier
The task function receives the input field from each dataset item. Return the output that should be compared against expectedOutput by evaluators.
When to Use Which API
Dataset Management createDataset / addDatasetItem / getDatasetBuild and manage test datasets programmatically. Use for CI/CD pipelines or when generating test cases from production data.
Test Execution runTestSuiteExecute your AI task against a dataset with automatic tracing and evaluation. Use for regression testing and model comparisons.
Advanced Workflows createRunCreate runs without immediate execution. Use when you need custom orchestration or want to manage run lifecycle separately.
Evaluators Evaluator IDs or Configs Pass evaluator IDs to runTestSuite to automatically score outputs. Configure custom evaluators in the Netra dashboard.
Complete Example
import { Netra } from "netra-sdk-js" ;
import OpenAI from "openai" ;
async function main () {
// Initialize
const client = new Netra ({
apiKey: "your-api-key" ,
});
const openai = new OpenAI ();
// 1. Create a dataset
const datasetResponse = await client . evaluation . createDataset (
"Product FAQ Evaluation" ,
[ "faq" , "products" , "v2" ]
);
if ( ! datasetResponse ) {
console . error ( "Failed to create dataset" );
return ;
}
const datasetId = datasetResponse . id ;
console . log ( `Created dataset: ${ datasetId } ` );
// 2. Add test items
const testCases = [
{
input: "What is your return policy?" ,
expectedOutput: "Items can be returned within 30 days." ,
},
{
input: "How long does shipping take?" ,
expectedOutput: "Standard shipping takes 3-5 business days." ,
},
{
input: "Do you offer international shipping?" ,
expectedOutput: "Yes, we ship to over 50 countries." ,
},
];
for ( const testCase of testCases ) {
await client . evaluation . addDatasetItem ( datasetId , {
input: testCase . input ,
expectedOutput: testCase . expectedOutput ,
});
}
console . log ( `Added ${ testCases . length } test items` );
// 3. Define the task
async function faqAgent ( query : string ) : Promise < string > {
const response = await openai . chat . completions . create ({
model: "gpt-4o-mini" ,
messages: [
{
role: "system" ,
content: "You are a customer support agent. Answer concisely." ,
},
{ role: "user" , content: query },
],
});
return response . choices [ 0 ]. message . content || "" ;
}
// 4. Run the test suite
const dataset = await client . evaluation . getDataset ( datasetId );
if ( dataset ) {
const result = await client . evaluation . runTestSuite (
"FAQ Agent v2 Evaluation" ,
dataset ,
faqAgent ,
[ "correctness" , "relevance" ],
5
);
// 5. Review results
if ( result ) {
console . log ( ` \n Run completed: ${ result . runId } ` );
for ( const item of result . items ) {
console . log (
` Item ${ item . index } : ${ item . status } (trace: ${ item . traceId } )`
);
}
console . log (
" \n View detailed results in Netra dashboard → Evaluation → Test Runs"
);
}
}
}
main ();
Next Steps