Open in Google Colab
Run the complete notebook in your browser
All company names (TaskBot, ShopFlow) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.
What You’ll Learn
This cookbook guides you through 5 key stages of building an observable LangChain agent:1. Build a ReAct Agent
Create a LangChain ReAct agent with multiple tools that can reason about which tool to use.
2. Trace the Reasoning Loop
Capture each iteration of thought → action → observation with Netra spans.
3. Track Tool Calls
Monitor tool invocations with latency, inputs, outputs, and cost breakdowns.
4. Evaluate Tool Selection
Use the Tool Correctness evaluator to validate that the agent calls the right tools.
5. Debug Agent Failures
Identify failure patterns and iterate on prompts using trace analysis.
Prerequisites
- Python 3.9+
- OpenAI API key
- Netra API key (Get started here)
- LangChain installed
High-Level Concepts
Why Trace Agents?
Unlike simple LLM calls, agents involve multi-step reasoning that can fail in subtle ways:| Failure Mode | Symptom | What Tracing Reveals |
|---|---|---|
| Wrong tool selection | Agent uses incorrect tool | Tool call sequence, decision reasoning |
| Infinite loops | Agent repeats actions | Iteration count, repeated patterns |
| Hallucinated tools | Agent calls non-existent tool | Tool names vs. available tools |
| Premature termination | Agent stops before completion | Final state, missing steps |
| Over-escalation | Agent escalates simple queries | Escalation triggers, query classification |
The ReAct Pattern
ReAct (Reasoning + Acting) agents follow an iterative loop:TaskBot Scenario
TaskBot is a fictional AI assistant for ShopFlow, an e-commerce platform. It handles user queries using five tools:| Tool | Description | When to Use |
|---|---|---|
lookup_ticket | Retrieve ticket details by ID | User references a ticket number |
search_kb | Search knowledge base | General product/policy questions |
check_order_status | Get order status and tracking | Order-related inquiries |
process_refund | Initiate a refund | Refund requests (with validation) |
escalate_to_human | Transfer to human operator | Complex issues, urgent requests |
Building the TaskBot Agent
Let’s build the ReAct agent first, then add tracing and evaluation.Installation
Install the required packages:Environment Setup
Configure your API keys:Mock Data
First, let’s define mock data that our tools will operate on:Define the Tools
Create LangChain tools with proper type annotations and docstrings:Create the ReAct Agent
Build the agent using LangChain’s ReAct implementation:Test the Basic Agent
Verify the agent works before adding tracing:Adding Observability with Netra
Now let’s instrument the agent for full observability.Initialize Netra with LangChain Instrumentation
Netra provides auto-instrumentation for LangChain that captures agent execution automatically:- Agent execution spans
- LLM calls with prompts and completions
- Tool invocations with inputs and outputs
- Token usage and costs
Tracing Agent Execution with Decorators
For more control, wrap your agent handler with the@agent decorator:
Tracing Tool Calls
The auto-instrumentation captures tool calls, but you can add custom tracing for business logic:Manual Span Tracing for Custom Workflows
For fine-grained control over trace structure, use manual spans:Viewing Agent Traces
After running requests, navigate to Observability → Traces in Netra. You’ll see the full agent execution flow:
- Parent span: The overall agent execution
- LLM calls: Each reasoning step with prompts and completions
- Tool calls: Each tool invocation with inputs, outputs, and latency
- Token usage: Cumulative token counts and costs
Running Sample Requests
Let’s test the agent with different query types to see tracing in action.Simple Query: FAQ Lookup
search_kb once and returns the policy information.
Single-Tool Query: Order Status
check_order_status and provides tracking information.
Multi-Step Query: Refund Request
check_order_status to verify the order, then process_refund to initiate the refund.
Edge Case: Escalation Required
lookup_ticket to get context, then escalate_to_human due to the urgent tone.
Comparing Traces
After running these requests, compare the traces in the Netra dashboard:
- Simple queries have 1-2 tool calls
- Complex queries have multiple tool calls in sequence
- Escalation queries show the agent’s decision-making process
Evaluating Agent Performance
Systematic evaluation ensures your agent behaves correctly across different scenarios.Why Evaluate Agents?
Agent evaluation differs from simple LLM evaluation:| Dimension | What to Measure | Why It Matters |
|---|---|---|
| Tool Selection | Did it call the right tools? | Wrong tools = wrong answers |
| Tool Sequence | Did it call tools in the right order? | Order matters for multi-step workflows |
| Completion | Did it resolve the query? | Premature stops frustrate users |
| Escalation Accuracy | Did it escalate appropriately? | Over/under-escalation impacts operations |
Creating Test Datasets
Define test cases with expected tool calls:Using the Tool Correctness Evaluator
Netra provides a Tool Correctness evaluator that validates tool selection. Configure it in Evaluation → Evaluators:| Setting | Value |
|---|---|
| Name | Tool Selection Accuracy |
| Type | Tool Correctness |
| Pass Criteria | Score >= 0.8 |
- Expected tools called: Did the agent call all required tools?
- Forbidden tools avoided: Did it avoid calling tools it shouldn’t?
- Sequence correctness: Were tools called in the expected order?

Creating a Code Evaluator for Escalation
For custom business logic, create a Code Evaluator to measure escalation precision:>= 0.8.
Running Evaluation Experiments
Create a script to run all test cases and collect results:Viewing Evaluation Results
Navigate to Evaluation → Experiments in Netra to see the results:
- Pass rate by category: Which query types are handled correctly
- Tool accuracy: How often the agent selects the right tools
- Failure analysis: Which test cases failed and why
Analyzing Results and Iterating
Use traces to debug failures and improve your agent.Using Traces to Debug Failures
When a test case fails:- Find the trace: Filter by the test case user ID (e.g.,
eval-TC-007) - Examine the reasoning: Look at the thought steps to understand the decision
- Check tool calls: Verify which tools were called and in what order
- Identify the root cause: Was it a prompt issue, tool description issue, or LLM limitation?
Iterating on the Agent
After identifying issues, iterate on:- Prompt engineering: Add clearer instructions for tool selection
- Tool descriptions: Make tool purposes more explicit
- Examples: Add few-shot examples for edge cases
- Guardrails: Add validation before certain tool calls
Summary
Key Takeaways
- ReAct agents need visibility into the reasoning loop—trace each thought, action, and observation
- Tool call tracing reveals latency bottlenecks and decision patterns
- Tool Correctness evaluator validates that agents call the right tools in the right order
- Test cases by category ensure coverage across simple, complex, and edge scenarios
- Trace analysis enables systematic debugging of agent failures
What You Built
- A LangChain ReAct agent with 5 tools for e-commerce assistance
- Full observability with Netra auto-instrumentation
- Custom span tracing for business logic
- Evaluation suite with tool correctness checks
- Debugging workflow using trace analysis