Observability for LangChain ReAct Agents with Tool Use - What is Netra?

This cookbook walks you through adding complete observability and evaluation to a LangChain ReAct agent—tracing each step of the reasoning loop, capturing tool invocations with latency breakdowns, and measuring tool selection accuracy.

Open in Google Colab

Run the complete notebook in your browser

All company names (TaskBot, ShopFlow) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.

What You’ll Learn

This cookbook guides you through 5 key stages of building an observable LangChain agent:

1. Build a ReAct Agent

Create a LangChain ReAct agent with multiple tools that can reason about which tool to use.

2. Trace the Reasoning Loop

Capture each iteration of thought → action → observation with Netra spans.

3. Track Tool Calls

Monitor tool invocations with latency, inputs, outputs, and cost breakdowns.

4. Evaluate Tool Selection

Use the Tool Correctness evaluator to validate that the agent calls the right tools.

5. Debug Agent Failures

Identify failure patterns and iterate on prompts using trace analysis.

Prerequisites

Python 3.9+
OpenAI API key
Netra API key (Get started here)
LangChain installed

High-Level Concepts

Why Trace Agents?

Unlike simple LLM calls, agents involve multi-step reasoning that can fail in subtle ways:

Failure Mode	Symptom	What Tracing Reveals
Wrong tool selection	Agent uses incorrect tool	Tool call sequence, decision reasoning
Infinite loops	Agent repeats actions	Iteration count, repeated patterns
Hallucinated tools	Agent calls non-existent tool	Tool names vs. available tools
Premature termination	Agent stops before completion	Final state, missing steps
Over-escalation	Agent escalates simple queries	Escalation triggers, query classification

Without visibility into the reasoning loop, debugging these failures requires guesswork.

The ReAct Pattern

ReAct (Reasoning + Acting) agents follow an iterative loop:

┌─────────────────────────────────────────────────────┐
│                    User Query                        │
└─────────────────────┬───────────────────────────────┘
                      │
                      ▼
         ┌────────────────────────┐
         │   Thought: Reason      │◄──────────────┐
         │   about what to do     │               │
         └───────────┬────────────┘               │
                     │                            │
                     ▼                            │
         ┌────────────────────────┐               │
         │   Action: Select and   │               │
         │   invoke a tool        │               │
         └───────────┬────────────┘               │
                     │                            │
                     ▼                            │
         ┌────────────────────────┐               │
         │   Observation: Get     │───────────────┘
         │   tool result          │    (loop until done)
         └───────────┬────────────┘
                     │
                     ▼
         ┌────────────────────────┐
         │   Final Answer         │
         └────────────────────────┘

Netra captures each iteration as nested spans, giving you visibility into the agent’s decision-making process.

TaskBot Scenario

TaskBot is a fictional AI assistant for ShopFlow, an e-commerce platform. It handles user queries using five tools:

Tool	Description	When to Use
`lookup_ticket`	Retrieve ticket details by ID	User references a ticket number
`search_kb`	Search knowledge base	General product/policy questions
`check_order_status`	Get order status and tracking	Order-related inquiries
`process_refund`	Initiate a refund	Refund requests (with validation)
`escalate_to_human`	Transfer to human operator	Complex issues, urgent requests

Building the TaskBot Agent

Let’s build the ReAct agent first, then add tracing and evaluation.

Installation

Install the required packages:

pip install netra-sdk langchain langchain-openai openai

Environment Setup

Configure your API keys:

export NETRA_API_KEY="your-netra-api-key"
export OPENAI_API_KEY="your-openai-api-key"

Mock Data

First, let’s define mock data that our tools will operate on:

from typing import Dict, List, Optional
from dataclasses import dataclass
from datetime import datetime, timedelta

# Mock ticket database
TICKETS: Dict[str, dict] = {
    "TKT-001": {
        "id": "TKT-001",
        "subject": "Return policy question",
        "status": "open",
        "created_at": "2026-01-15T10:30:00Z",
        "order_id": None,
        "priority": "low",
    },
    "TKT-002": {
        "id": "TKT-002",
        "subject": "Damaged item received",
        "status": "open",
        "created_at": "2026-01-20T14:15:00Z",
        "order_id": "ORD-12345",
        "priority": "high",
    },
    "TKT-003": {
        "id": "TKT-003",
        "subject": "Urgent: 3 week delay",
        "status": "open",
        "created_at": "2026-01-10T09:00:00Z",
        "order_id": "ORD-99999",
        "priority": "critical",
    },
}

# Mock order database
ORDERS: Dict[str, dict] = {
    "ORD-12345": {
        "id": "ORD-12345",
        "status": "delivered",
        "items": ["Wireless Headphones"],
        "total": 79.99,
        "tracking_number": "1Z999AA10123456784",
        "delivered_at": "2026-01-18T16:30:00Z",
    },
    "ORD-99999": {
        "id": "ORD-99999",
        "status": "processing",
        "items": ["Gaming Monitor"],
        "total": 349.99,
        "tracking_number": None,
        "estimated_ship_date": "2026-02-01T00:00:00Z",
    },
}

# Mock knowledge base
KNOWLEDGE_BASE: List[dict] = [
    {
        "title": "Return Policy",
        "content": "Items can be returned within 30 days of delivery for a full refund. "
                   "Items must be in original packaging and unused condition. "
                   "Electronics must include all accessories.",
    },
    {
        "title": "Refund Processing",
        "content": "Refunds are processed within 5-7 business days after we receive "
                   "the returned item. Refunds are credited to the original payment method.",
    },
    {
        "title": "Shipping Times",
        "content": "Standard shipping: 5-7 business days. Express shipping: 2-3 business days. "
                   "Processing time is 1-2 business days before shipping.",
    },
    {
        "title": "Damaged Items",
        "content": "If you received a damaged item, please contact us within 48 hours. "
                   "We will arrange a replacement or full refund including shipping costs.",
    },
]

Define the Tools

Create LangChain tools with proper type annotations and docstrings:

from langchain.tools import tool

@tool
def lookup_ticket(ticket_id: str) -> str:
    """Look up a ticket by its ID to get details about the issue.

    Args:
        ticket_id: The ticket ID (e.g., TKT-001)

    Returns:
        Ticket details including subject, status, priority, and associated order
    """
    ticket = TICKETS.get(ticket_id.upper())
    if not ticket:
        return f"No ticket found with ID: {ticket_id}"

    return (
        f"Ticket {ticket['id']}:\n"
        f"  Subject: {ticket['subject']}\n"
        f"  Status: {ticket['status']}\n"
        f"  Priority: {ticket['priority']}\n"
        f"  Created: {ticket['created_at']}\n"
        f"  Associated Order: {ticket['order_id'] or 'None'}"
    )

@tool
def search_kb(query: str) -> str:
    """Search the knowledge base for information about policies, procedures, or FAQs.

    Args:
        query: The search query (e.g., "return policy", "shipping times")

    Returns:
        Relevant knowledge base articles matching the query
    """
    query_lower = query.lower()
    results = []

    for article in KNOWLEDGE_BASE:
        if (query_lower in article["title"].lower() or
            query_lower in article["content"].lower()):
            results.append(f"**{article['title']}**\n{article['content']}")

    if not results:
        return "No relevant articles found. Try different search terms."

    return "\n\n".join(results)

@tool
def check_order_status(order_id: str) -> str:
    """Check the status of an order including shipping and tracking information.

    Args:
        order_id: The order ID (e.g., ORD-12345)

    Returns:
        Order status, items, tracking number, and delivery information
    """
    order = ORDERS.get(order_id.upper())
    if not order:
        return f"No order found with ID: {order_id}"

    status_info = f"Order {order['id']}:\n"
    status_info += f"  Status: {order['status']}\n"
    status_info += f"  Items: {', '.join(order['items'])}\n"
    status_info += f"  Total: ${order['total']:.2f}\n"

    if order.get("tracking_number"):
        status_info += f"  Tracking: {order['tracking_number']}\n"
    if order.get("delivered_at"):
        status_info += f"  Delivered: {order['delivered_at']}\n"
    if order.get("estimated_ship_date"):
        status_info += f"  Est. Ship Date: {order['estimated_ship_date']}\n"

    return status_info

@tool
def process_refund(order_id: str, reason: str) -> str:
    """Process a refund for an order. Only use after verifying the order status.

    Args:
        order_id: The order ID to refund
        reason: The reason for the refund

    Returns:
        Confirmation of refund initiation or error message
    """
    order = ORDERS.get(order_id.upper())
    if not order:
        return f"Cannot process refund: No order found with ID {order_id}"

    if order["status"] not in ["delivered", "shipped"]:
        return f"Cannot process refund: Order status is '{order['status']}'. Refunds are only available for shipped or delivered orders."

    return (
        f"Refund initiated for Order {order_id}:\n"
        f"  Amount: ${order['total']:.2f}\n"
        f"  Reason: {reason}\n"
        f"  Status: Processing\n"
        f"  Expected completion: 5-7 business days\n"
        f"  Refund will be credited to original payment method."
    )

@tool
def escalate_to_human(ticket_id: str, reason: str) -> str:
    """Escalate a ticket to a human operator. Use for complex issues or urgent requests.

    Args:
        ticket_id: The ticket ID to escalate
        reason: The reason for escalation

    Returns:
        Confirmation of escalation
    """
    ticket = TICKETS.get(ticket_id.upper()) if ticket_id else None

    return (
        f"Ticket escalated to human operator:\n"
        f"  Ticket ID: {ticket_id or 'New ticket created'}\n"
        f"  Reason: {reason}\n"
        f"  Priority: Urgent\n"
        f"  Expected response: Within 1 hour\n"
        f"  A specialist will contact you shortly."
    )

Create the ReAct Agent

Build the agent using LangChain’s ReAct implementation:

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.prompts import PromptTemplate

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define the tools
tools = [lookup_ticket, search_kb, check_order_status, process_refund, escalate_to_human]

# Create the ReAct prompt
react_prompt = PromptTemplate.from_template("""You are TaskBot, an AI assistant for ShopFlow e-commerce platform.

You help users with:
- Order status and tracking
- Return and refund requests
- Policy questions
- Escalating complex issues

Always be helpful and professional. Use tools to look up information before responding.
For refund requests, always check the order status first.
Escalate to human operators when: the user is frustrated, the issue is complex, or you cannot resolve it.

You have access to these tools:
{tools}

Use this format:

Question: the user's question
Thought: think about what to do
Action: the tool name (one of [{tool_names}])
Action Input: the input to the tool
Observation: the tool's output
... (repeat Thought/Action/Action Input/Observation as needed)
Thought: I now know the final answer
Final Answer: the response to the user

Begin!

Question: {input}
Thought: {agent_scratchpad}""")

# Create the agent
agent = create_react_agent(llm, tools, react_prompt)

# Create the executor
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=10,
)

Test the Basic Agent

Verify the agent works before adding tracing:

# Simple FAQ query
response = agent_executor.invoke({
    "input": "What is your return policy?"
})
print(response["output"])

# Order status query
response = agent_executor.invoke({
    "input": "Where is my order ORD-12345?"
})
print(response["output"])

Adding Observability with Netra

Now let’s instrument the agent for full observability.

Initialize Netra with LangChain Instrumentation

Netra provides auto-instrumentation for LangChain that captures agent execution automatically:

from netra import Netra
from netra.instrumentation.instruments import InstrumentSet

# Initialize Netra with LangChain and OpenAI instrumentation
Netra.init(
    app_name="taskbot",
    instruments=set([InstrumentSet.OPENAI, InstrumentSet.LANGCHAIN]),
    trace_content=True,
)

With auto-instrumentation enabled, Netra automatically captures:

Agent execution spans
LLM calls with prompts and completions
Tool invocations with inputs and outputs
Token usage and costs

Tracing Agent Execution with Decorators

For more control, wrap your agent handler with the @agent decorator:

from netra.decorators import agent, task

@agent(name="taskbot-agent")
def handle_request(query: str, ticket_id: str = None, user_id: str = None) -> dict:
    """Handle a user request with full tracing."""

    # Set user context if provided
    if user_id:
        Netra.set_user_id(user_id)

    # Add custom attributes
    if ticket_id:
        Netra.set_custom_attributes(key="ticket_id", value=ticket_id)

    # Execute the agent
    result = agent_executor.invoke({"input": query})

    return {
        "query": query,
        "response": result["output"],
        "ticket_id": ticket_id,
    }

Tracing Tool Calls

The auto-instrumentation captures tool calls, but you can add custom tracing for business logic:

from netra import Netra, SpanType

@tool
def lookup_ticket_traced(ticket_id: str) -> str:
    """Look up a ticket with custom span attributes."""
    with Netra.start_span("ticket-lookup", as_type=SpanType.TOOL) as span:
        span.set_attribute("ticket_id", ticket_id)

        ticket = TICKETS.get(ticket_id.upper())

        if ticket:
            span.set_attribute("ticket_status", ticket["status"])
            span.set_attribute("ticket_priority", ticket["priority"])
            span.set_attribute("found", True)
        else:
            span.set_attribute("found", False)

        # Return the result
        if not ticket:
            return f"No ticket found with ID: {ticket_id}"

        return (
            f"Ticket {ticket['id']}:\n"
            f"  Subject: {ticket['subject']}\n"
            f"  Status: {ticket['status']}\n"
            f"  Priority: {ticket['priority']}"
        )

Manual Span Tracing for Custom Workflows

For fine-grained control over trace structure, use manual spans:

from netra import Netra, SpanType

def handle_complex_request(query: str, ticket_id: str = None):
    """Handle a request with detailed manual tracing."""

    with Netra.start_span("request-handler", as_type=SpanType.AGENT) as parent_span:
        parent_span.set_attribute("query", query)
        parent_span.set_attribute("ticket_id", ticket_id)

        # Classification step
        with Netra.start_span("query-classification") as class_span:
            # Classify the query type
            query_type = classify_query(query)
            class_span.set_attribute("query_type", query_type)

        # Agent execution
        with Netra.start_span("agent-execution", as_type=SpanType.AGENT) as agent_span:
            result = agent_executor.invoke({"input": query})
            agent_span.set_attribute("iterations", result.get("intermediate_steps", []))

        # Post-processing
        with Netra.start_span("response-formatting") as format_span:
            formatted_response = format_response(result["output"])
            format_span.set_attribute("response_length", len(formatted_response))

        return formatted_response

def classify_query(query: str) -> str:
    """Classify query type for routing."""
    query_lower = query.lower()
    if "refund" in query_lower:
        return "refund"
    elif "order" in query_lower or "tracking" in query_lower:
        return "order_status"
    elif "urgent" in query_lower or "help" in query_lower:
        return "escalation"
    else:
        return "general"

def format_response(response: str) -> str:
    """Format the response for the user."""
    return response.strip()

Viewing Agent Traces

After running requests, navigate to Observability → Traces in Netra. You’ll see the full agent execution flow:

Netra trace view showing nested agent spans with thought, action, and observation steps

The trace shows:

Parent span: The overall agent execution
LLM calls: Each reasoning step with prompts and completions
Tool calls: Each tool invocation with inputs, outputs, and latency
Token usage: Cumulative token counts and costs

Running Sample Requests

Let’s test the agent with different query types to see tracing in action.

Simple Query: FAQ Lookup

# Single-tool query - should use search_kb
response = handle_request(
    query="What is your return policy?",
    user_id="user-001",
)
print(response["response"])

Expected behavior: Agent uses search_kb once and returns the policy information.

Single-Tool Query: Order Status

# Order status query - should use check_order_status
response = handle_request(
    query="Where is my order ORD-12345?",
    user_id="user-002",
)
print(response["response"])

Expected behavior: Agent uses check_order_status and provides tracking information.

Multi-Step Query: Refund Request

# Multi-step workflow - should use multiple tools
response = handle_request(
    query="I want a refund for order ORD-12345, the item arrived damaged",
    ticket_id="TKT-002",
    user_id="user-003",
)
print(response["response"])

Expected behavior: Agent uses check_order_status to verify the order, then process_refund to initiate the refund.

Edge Case: Escalation Required

# Escalation scenario - should detect urgency
response = handle_request(
    query="I've been waiting 3 weeks and need urgent help! I want to speak to someone immediately!",
    ticket_id="TKT-003",
    user_id="user-004",
)
print(response["response"])

Expected behavior: Agent uses lookup_ticket to get context, then escalate_to_human due to the urgent tone.

Comparing Traces

After running these requests, compare the traces in the Netra dashboard:

Comparison of agent traces showing different tool call patterns for simple vs complex queries

Notice how:

Simple queries have 1-2 tool calls
Complex queries have multiple tool calls in sequence
Escalation queries show the agent’s decision-making process

Evaluating Agent Performance

Systematic evaluation ensures your agent behaves correctly across different scenarios.

Why Evaluate Agents?

Agent evaluation differs from simple LLM evaluation:

Dimension	What to Measure	Why It Matters
Tool Selection	Did it call the right tools?	Wrong tools = wrong answers
Tool Sequence	Did it call tools in the right order?	Order matters for multi-step workflows
Completion	Did it resolve the query?	Premature stops frustrate users
Escalation Accuracy	Did it escalate appropriately?	Over/under-escalation impacts operations

Creating Test Datasets

Define test cases with expected tool calls:

TEST_CASES = [
    # Simple FAQ queries
    {
        "id": "TC-001",
        "category": "faq",
        "query": "What is your return policy?",
        "expected_tools": ["search_kb"],
        "forbidden_tools": ["process_refund", "escalate_to_human"],
        "should_escalate": False,
    },
    {
        "id": "TC-002",
        "category": "faq",
        "query": "How long does shipping take?",
        "expected_tools": ["search_kb"],
        "forbidden_tools": ["process_refund", "escalate_to_human"],
        "should_escalate": False,
    },

    # Order status queries
    {
        "id": "TC-003",
        "category": "order",
        "query": "Where is my order ORD-12345?",
        "expected_tools": ["check_order_status"],
        "forbidden_tools": ["process_refund"],
        "should_escalate": False,
    },
    {
        "id": "TC-004",
        "category": "order",
        "query": "Can you check the tracking for order ORD-12345?",
        "expected_tools": ["check_order_status"],
        "forbidden_tools": ["process_refund"],
        "should_escalate": False,
    },

    # Refund requests (multi-step)
    {
        "id": "TC-005",
        "category": "refund",
        "query": "I want a refund for order ORD-12345, item was damaged",
        "expected_tools": ["check_order_status", "process_refund"],
        "forbidden_tools": [],
        "should_escalate": False,
    },
    {
        "id": "TC-006",
        "category": "refund",
        "query": "Please process a refund for my damaged headphones, order ORD-12345",
        "expected_tools": ["check_order_status", "process_refund"],
        "forbidden_tools": [],
        "should_escalate": False,
    },

    # Escalation scenarios
    {
        "id": "TC-007",
        "category": "escalation",
        "query": "This is ridiculous! I've been waiting 3 weeks! I need to speak to someone NOW!",
        "expected_tools": ["escalate_to_human"],
        "forbidden_tools": [],
        "should_escalate": True,
    },
    {
        "id": "TC-008",
        "category": "escalation",
        "query": "I've tried everything and nothing works. I need human help.",
        "expected_tools": ["escalate_to_human"],
        "forbidden_tools": [],
        "should_escalate": True,
    },
]

Using the Tool Correctness Evaluator

Netra provides a Tool Correctness evaluator that validates tool selection. Configure it in Evaluation → Evaluators:

Setting	Value
Name	Tool Selection Accuracy
Type	Tool Correctness
Pass Criteria	Score >= 0.8

The evaluator checks:

Expected tools called: Did the agent call all required tools?
Forbidden tools avoided: Did it avoid calling tools it shouldn’t?
Sequence correctness: Were tools called in the expected order?

Tool Correctness evaluator configuration in Netra

Creating a Code Evaluator for Escalation

For custom business logic, create a Code Evaluator to measure escalation precision:

// handler function is required
function handler(input, output, expectedOutput) {
    const shouldEscalate = expectedOutput?.should_escalate || false;

    // Check if the agent called escalate_to_human
    const outputLower = output.toLowerCase();
    const didEscalate = outputLower.includes("escalate") ||
                        outputLower.includes("human operator") ||
                        outputLower.includes("specialist will contact");

    // Score based on correct escalation decision
    if (shouldEscalate === didEscalate) {
        return 1; // Correct decision
    } else if (shouldEscalate && !didEscalate) {
        return 0; // False negative - should have escalated
    } else {
        return 0.5; // False positive - over-escalation (less severe)
    }
}

Set Output Type to Numerical and Pass Criteria to >= 0.8.

Running Evaluation Experiments

Create a script to run all test cases and collect results:

from netra import Netra
from netra.decorators import agent
import json

def run_evaluation():
    """Run all test cases and collect results."""
    results = []

    for test_case in TEST_CASES:
        print(f"Running {test_case['id']}: {test_case['query'][:50]}...")

        try:
            # Run the agent
            response = handle_request(
                query=test_case["query"],
                user_id=f"eval-{test_case['id']}",
            )

            # Collect the result
            results.append({
                "test_id": test_case["id"],
                "category": test_case["category"],
                "query": test_case["query"],
                "response": response["response"],
                "expected_tools": test_case["expected_tools"],
                "forbidden_tools": test_case["forbidden_tools"],
                "should_escalate": test_case["should_escalate"],
                "status": "success",
            })

        except Exception as e:
            results.append({
                "test_id": test_case["id"],
                "category": test_case["category"],
                "query": test_case["query"],
                "error": str(e),
                "status": "error",
            })

    return results

# Run the evaluation
evaluation_results = run_evaluation()

# Print summary
for category in ["faq", "order", "refund", "escalation"]:
    category_results = [r for r in evaluation_results if r["category"] == category]
    success_count = len([r for r in category_results if r["status"] == "success"])
    print(f"{category}: {success_count}/{len(category_results)} successful")

Viewing Evaluation Results

Navigate to Evaluation → Experiments in Netra to see the results:

Evaluation results showing pass/fail rates by category

The dashboard shows:

Pass rate by category: Which query types are handled correctly
Tool accuracy: How often the agent selects the right tools
Failure analysis: Which test cases failed and why

Analyzing Results and Iterating

Use traces to debug failures and improve your agent.

Using Traces to Debug Failures

When a test case fails:

Find the trace: Filter by the test case user ID (e.g., eval-TC-007)
Examine the reasoning: Look at the thought steps to understand the decision
Check tool calls: Verify which tools were called and in what order
Identify the root cause: Was it a prompt issue, tool description issue, or LLM limitation?

Example debugging flow:

# If escalation is under-triggering, examine the trace:
# 1. Look at the "Thought" span - did the agent recognize urgency?
# 2. Check if the prompt includes escalation criteria
# 3. Add explicit examples of when to escalate

# Updated prompt with better escalation guidance:
escalation_guidance = """
Escalate to human operators when ANY of these conditions are met:
- User expresses frustration (words like "ridiculous", "unacceptable", "furious")
- User has been waiting more than 2 weeks
- User explicitly asks to speak to a human
- The issue involves policy exceptions
- You cannot resolve the issue with available tools
"""

Iterating on the Agent

After identifying issues, iterate on:

Prompt engineering: Add clearer instructions for tool selection
Tool descriptions: Make tool purposes more explicit
Examples: Add few-shot examples for edge cases
Guardrails: Add validation before certain tool calls

Summary

Key Takeaways

ReAct agents need visibility into the reasoning loop—trace each thought, action, and observation
Tool call tracing reveals latency bottlenecks and decision patterns
Tool Correctness evaluator validates that agents call the right tools in the right order
Test cases by category ensure coverage across simple, complex, and edge scenarios
Trace analysis enables systematic debugging of agent failures

What You Built

A LangChain ReAct agent with 5 tools for e-commerce assistance
Full observability with Netra auto-instrumentation
Custom span tracing for business logic
Evaluation suite with tool correctness checks
Debugging workflow using trace analysis

Learn More

Agents Documentation

Deep dive into agent observability features

LangChain Integration

Complete LangChain instrumentation guide

Evaluators

Build custom evaluators for your use case

Traces Overview

Understanding trace structure and analysis

Cookbooks

Open in Google Colab

​What You’ll Learn

1. Build a ReAct Agent

2. Trace the Reasoning Loop

3. Track Tool Calls

4. Evaluate Tool Selection

5. Debug Agent Failures

​Prerequisites

​High-Level Concepts

​Why Trace Agents?

​The ReAct Pattern

​TaskBot Scenario

​Building the TaskBot Agent

​Installation

​Environment Setup

​Mock Data

​Define the Tools

​Create the ReAct Agent

​Test the Basic Agent

​Adding Observability with Netra

​Initialize Netra with LangChain Instrumentation

​Tracing Agent Execution with Decorators

​Tracing Tool Calls

​Manual Span Tracing for Custom Workflows

​Viewing Agent Traces

​Running Sample Requests

​Simple Query: FAQ Lookup

​Single-Tool Query: Order Status

​Multi-Step Query: Refund Request

​Edge Case: Escalation Required

​Comparing Traces

​Evaluating Agent Performance

​Why Evaluate Agents?

​Creating Test Datasets

​Using the Tool Correctness Evaluator

​Creating a Code Evaluator for Escalation

​Running Evaluation Experiments

​Viewing Evaluation Results

​Analyzing Results and Iterating

​Using Traces to Debug Failures

​Iterating on the Agent

​Summary

​Key Takeaways

​What You Built

​Learn More

Agents Documentation

LangChain Integration

Evaluators

Traces Overview

What You’ll Learn

Prerequisites

High-Level Concepts

Why Trace Agents?

The ReAct Pattern

TaskBot Scenario

Building the TaskBot Agent

Installation

Environment Setup

Mock Data

Define the Tools

Create the ReAct Agent

Test the Basic Agent

Adding Observability with Netra

Initialize Netra with LangChain Instrumentation

Tracing Agent Execution with Decorators

Tracing Tool Calls

Manual Span Tracing for Custom Workflows

Viewing Agent Traces

Running Sample Requests

Simple Query: FAQ Lookup

Single-Tool Query: Order Status

Multi-Step Query: Refund Request

Edge Case: Escalation Required

Comparing Traces

Evaluating Agent Performance

Why Evaluate Agents?

Creating Test Datasets

Using the Tool Correctness Evaluator

Creating a Code Evaluator for Escalation

Running Evaluation Experiments

Viewing Evaluation Results

Analyzing Results and Iterating

Using Traces to Debug Failures

Iterating on the Agent

Summary

Key Takeaways

What You Built

Learn More