Skip to main content
The Netra SDK exposes an evaluation client that lets you:
  • Manage datasets - Create datasets and add test items
  • Run test suites - Execute tasks against datasets with automatic tracing
  • Apply evaluators - Score outputs using built-in or custom evaluators
This page shows how to use netra.evaluation to manage datasets, run test suites, and programmatically evaluate your AI applications.

Getting Started

The evaluation client is available on the main Netra entry point after initialization.
import { Netra } from "netra-sdk-js";

const client = new Netra({
  apiKey: "your-api-key",
});

// Access the evaluation client
await client.evaluation.createDataset(...);
await client.evaluation.addDatasetItem(...);
await client.evaluation.getDataset(...);
await client.evaluation.runTestSuite(...);

createDataset

Create an empty dataset that can hold test items for evaluation runs.
import { Netra } from "netra-sdk-js";

const client = new Netra({ apiKey: "..." });

const result = await client.evaluation.createDataset(
  "Customer Support QA",           // name
  ["support", "qa", "v1"]          // tags (optional)
);

if (result) {
  console.log(`Dataset created: ${result.id}`);
  console.log(`Name: ${result.name}`);
  console.log(`Tags: ${result.tags}`);
}

Parameters

ParameterTypeDescription
namestringName of the dataset (required)
tagsstring[]?Optional tags for categorization

Response: CreateDatasetResponse

FieldTypeDescription
idstringUnique dataset identifier
namestringDataset name
tagsstring[]Associated tags
projectIdstringProject identifier
organizationIdstringOrganization identifier
createdBystringCreator identifier
updatedBystringLast updater identifier
createdAtstringCreation timestamp
updatedAtstringLast update timestamp
deletedAtstring | nullDeletion timestamp (if soft-deleted)

addDatasetItem

Add a single test item to an existing dataset.
import { Netra } from "netra-sdk-js";

const client = new Netra({ apiKey: "..." });

const result = await client.evaluation.addDatasetItem(
  "dataset-123",  // datasetId
  {               // item
    input: "What is the return policy for electronics?",
    expectedOutput: "Electronics can be returned within 30 days with original packaging.",
    tags: ["policy", "returns"],
    metadata: { category: "electronics", priority: "high" },
  }
);

if (result) {
  console.log(`Item added: ${result.id}`);
  console.log(`Input: ${result.input}`);
}

Parameters

ParameterTypeDescription
datasetIdstringID of the target dataset
itemDatasetEntryThe test item to add

DatasetEntry

FieldTypeDescription
inputanyThe input to pass to your task (required)
expectedOutputany?Expected output for comparison
tagsstring[]?Optional tags for the item
metadataRecord<string, any>?Optional metadata for evaluators

Response: AddDatasetItemResponse

FieldTypeDescription
idstringUnique item identifier
datasetIdstringParent dataset ID
projectIdstringProject identifier
organizationIdstringOrganization identifier
sourcestringSource of the item
sourceIdstring?Source reference ID
inputanyThe input value
expectedOutputanyThe expected output
isActivebooleanWhether the item is active
tagsstring[]Associated tags
metadataRecord<string, any>?Item metadata
createdBystringCreator identifier
updatedBystringLast updater identifier
createdAtstringCreation timestamp
updatedAtstringLast update timestamp
deletedAtstring?Deletion timestamp (if soft-deleted)

getDataset

Retrieve a dataset and all its items by ID.
import { Netra } from "netra-sdk-js";

const client = new Netra({ apiKey: "..." });

const dataset = await client.evaluation.getDataset("dataset-123");

if (dataset) {
  console.log(`Total items: ${dataset.items.length}`);

  for (const item of dataset.items) {
    console.log(`ID: ${item.id}`);
    console.log(`Input: ${item.input}`);
    console.log(`Expected: ${item.expectedOutput}`);
    console.log("---");
  }
}

Parameters

ParameterTypeDescription
datasetIdstringID of the dataset to retrieve

Response: GetDatasetItemsResponse

FieldTypeDescription
itemsDatasetRecord[]List of dataset items

DatasetRecord

FieldTypeDescription
idstringItem identifier
datasetIdstringParent dataset ID
inputanyThe input value
expectedOutputanyThe expected output

runTestSuite

Execute a test suite against a dataset, running your task function on each item and optionally applying evaluators.
import { Netra } from "netra-sdk-js";
import OpenAI from "openai";

const client = new Netra({ apiKey: "..." });
const openai = new OpenAI();

// Task function that processes each dataset item
async function myTask(inputData: any): Promise<string> {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: inputData },
    ],
  });
  return response.choices[0].message.content || "";
}

// Get dataset
const dataset = await client.evaluation.getDataset("dataset-123");

if (dataset) {
  // Run test suite
  const result = await client.evaluation.runTestSuite(
    "GPT-4o Mini Evaluation",  // name
    dataset,                    // data
    myTask,                     // task
    ["correctness", "relevance"], // evaluators (optional)
    10                          // maxConcurrency
  );

  if (result) {
    console.log(`Run ID: ${result.runId}`);
    console.log(`Items processed: ${result.items.length}`);
  }
}

Parameters

ParameterTypeDescription
namestringName for this test run (required)
dataDatasetDataset from getDataset()
taskTaskFunctionFunction that takes input and returns output
evaluatorsany[]?Optional evaluator IDs or configs
maxConcurrencynumberMax parallel task executions (default: 50)

Response

FieldTypeDescription
runIdstringUnique run identifier
itemsobject[]Results for each processed item

Item Result

FieldTypeDescription
indexnumberItem index in dataset
statusstring"completed" or "failed"
traceIdstringTrace ID for observability
spanIdstringSpan ID for the task execution
testRunItemIdstringBackend item identifier
The task function receives the input field from each dataset item. Return the output that should be compared against expectedOutput by evaluators.

When to Use Which API

Dataset Management

createDataset / addDatasetItem / getDatasetBuild and manage test datasets programmatically. Use for CI/CD pipelines or when generating test cases from production data.

Test Execution

runTestSuiteExecute your AI task against a dataset with automatic tracing and evaluation. Use for regression testing and model comparisons.

Advanced Workflows

createRunCreate runs without immediate execution. Use when you need custom orchestration or want to manage run lifecycle separately.

Evaluators

Evaluator IDs or ConfigsPass evaluator IDs to runTestSuite to automatically score outputs. Configure custom evaluators in the Netra dashboard.

Complete Example

import { Netra } from "netra-sdk-js";
import OpenAI from "openai";

async function main() {
  // Initialize
  const client = new Netra({
    apiKey: "your-api-key",
  });
  const openai = new OpenAI();

  // 1. Create a dataset
  const datasetResponse = await client.evaluation.createDataset(
    "Product FAQ Evaluation",
    ["faq", "products", "v2"]
  );

  if (!datasetResponse) {
    console.error("Failed to create dataset");
    return;
  }

  const datasetId = datasetResponse.id;
  console.log(`Created dataset: ${datasetId}`);

  // 2. Add test items
  const testCases = [
    {
      input: "What is your return policy?",
      expectedOutput: "Items can be returned within 30 days.",
    },
    {
      input: "How long does shipping take?",
      expectedOutput: "Standard shipping takes 3-5 business days.",
    },
    {
      input: "Do you offer international shipping?",
      expectedOutput: "Yes, we ship to over 50 countries.",
    },
  ];

  for (const testCase of testCases) {
    await client.evaluation.addDatasetItem(datasetId, {
      input: testCase.input,
      expectedOutput: testCase.expectedOutput,
    });
  }
  console.log(`Added ${testCases.length} test items`);

  // 3. Define the task
  async function faqAgent(query: string): Promise<string> {
    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        {
          role: "system",
          content: "You are a customer support agent. Answer concisely.",
        },
        { role: "user", content: query },
      ],
    });
    return response.choices[0].message.content || "";
  }

  // 4. Run the test suite
  const dataset = await client.evaluation.getDataset(datasetId);

  if (dataset) {
    const result = await client.evaluation.runTestSuite(
      "FAQ Agent v2 Evaluation",
      dataset,
      faqAgent,
      ["correctness", "relevance"],
      5
    );

    // 5. Review results
    if (result) {
      console.log(`\nRun completed: ${result.runId}`);
      for (const item of result.items) {
        console.log(
          `  Item ${item.index}: ${item.status} (trace: ${item.traceId})`
        );
      }

      console.log(
        "\nView detailed results in Netra dashboard → Evaluation → Test Runs"
      );
    }
  }
}

main();

Next Steps

Last modified on February 10, 2026