Tracing CrewAI Pipelines

This cookbook shows you how to add complete observability to CrewAI multi-agent pipelines—tracing agent-to-agent handoffs, measuring individual agent performance, and tracking per-agent costs.

Open in Google Colab

Run the complete notebook in your browser

All company names (ContentCraft) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.

What You’ll Learn

Trace Agent Handoffs

Capture the message flow between agents as tasks pass through the pipeline

Track Per-Agent Costs

Monitor token usage and costs for each agent role to identify cost drivers

Debug Multi-Agent Flows

Understand why agents made specific decisions and where quality degrades

Compare Configurations

Run experiments with different model assignments to find the cost/quality sweet spot

Prerequisites:

Python >=3.10, <3.14
OpenAI API key
Netra API key (Get started here)
CrewAI installed

Why Trace Multi-Agent Systems?

Multi-agent systems introduce complexity that single-agent workflows don’t have:

Failure Mode	Symptom	What Tracing Reveals
Agent bottleneck	Pipeline slow	Which agent takes longest
Handoff failure	Context lost	Message content between agents
Cost explosion	Budget exceeded	Which agent uses most tokens
Quality degradation	Poor output	Where quality drops in pipeline
Model mismatch	Inconsistent results	Which model for which role

Without per-agent visibility, you can’t optimize individual roles or identify where the pipeline breaks down.

CrewAI Architecture

CrewAI organizes multi-agent work into three components:

Component	Description	Example
Agent	Autonomous unit with role, goal, backstory	Research Specialist, Content Writer
Task	Work item with description and expected output	”Research the topic”, “Write the draft”
Crew	Team of agents executing tasks	Content creation team

Processes:

Sequential: Tasks execute one after another (A → B → C)
Hierarchical: Manager agent delegates to workers

Building an Example Pipeline

Installation

pip install netra-sdk crewai crewai-tools openai langchain-openai

Environment Setup

export NETRA_API_KEY="your-netra-api-key"
export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint"
export OPENAI_API_KEY="your-openai-api-key"

Define the Agents

Create a 4-agent content pipeline: Researcher → Writer → Editor → SEO:

from crewai import Agent
from langchain_openai import ChatOpenAI

def create_agents(config: dict = None):
    """Create the content team agents with configurable models."""
    config = config or {
        "researcher": "gpt-4o",
        "writer": "gpt-4o",
        "editor": "gpt-3.5-turbo",
        "seo": "gpt-3.5-turbo",
    }

    researcher = Agent(
        role="Research Specialist",
        goal="Gather accurate facts, statistics, and expert opinions",
        backstory="Expert researcher with 10 years of experience in content research.",
        llm=ChatOpenAI(model=config["researcher"]),
        verbose=True,
    )

    writer = Agent(
        role="Content Writer",
        goal="Write engaging, well-structured blog articles",
        backstory="Professional copywriter with expertise in compelling content.",
        llm=ChatOpenAI(model=config["writer"]),
        verbose=True,
    )

    editor = Agent(
        role="Quality Editor",
        goal="Polish articles for clarity, grammar, and flow",
        backstory="Senior editor with a keen eye for detail.",
        llm=ChatOpenAI(model=config["editor"]),
        verbose=True,
    )

    seo_specialist = Agent(
        role="SEO Optimizer",
        goal="Optimize content for search engines",
        backstory="SEO expert who balances keywords with readability.",
        llm=ChatOpenAI(model=config["seo"]),
        verbose=True,
    )

    return {
        "researcher": researcher,
        "writer": writer,
        "editor": editor,
        "seo": seo_specialist,
    }

Define the Tasks

Create tasks that chain together:

from crewai import Task

def create_tasks(agents: dict, topic: str):
    """Create the content pipeline tasks."""

    research_task = Task(
        description=f"Research the topic: '{topic}'. Find key facts and statistics.",
        expected_output="Research brief with facts, statistics, and sources",
        agent=agents["researcher"],
    )

    writing_task = Task(
        description="Write a 800-1000 word blog article based on the research.",
        expected_output="Draft blog article in markdown format",
        agent=agents["writer"],
        context=[research_task],
    )

    editing_task = Task(
        description="Edit the article for grammar, flow, and clarity.",
        expected_output="Polished blog article with improved clarity",
        agent=agents["editor"],
        context=[writing_task],
    )

    seo_task = Task(
        description="Optimize the article for SEO with meta description and keywords.",
        expected_output="SEO-optimized article with metadata",
        agent=agents["seo"],
        context=[editing_task],
    )

    return [research_task, writing_task, editing_task, seo_task]

Create the Crew

from crewai import Crew, Process

def run_content_crew(topic: str, config: dict = None):
    """Execute the content creation pipeline."""
    agents = create_agents(config)
    tasks = create_tasks(agents, topic)

    crew = Crew(
        agents=list(agents.values()),
        tasks=tasks,
        process=Process.sequential,
        verbose=True,
    )

    return crew.kickoff()

Adding Netra Observability

Initialize Netra with Auto-Instrumentation

Netra provides auto-instrumentation for CrewAI that captures agent execution automatically:

from netra import Netra
from netra.instrumentation.instruments import InstrumentSet

# Initialize Netra with CrewAI and OpenAI instrumentation
Netra.init(
    app_name="contentcraft",
    environment="development",
    trace_content=True,
    instruments={InstrumentSet.CREWAI, InstrumentSet.OPENAI},
)

With auto-instrumentation enabled, Netra automatically captures:

Agent execution spans with role and backstory
Task execution with descriptions and outputs
LLM calls with prompts, completions, and token usage
Cost calculations per agent

Using the Workflow Decorator

For more control, wrap your pipeline with the @workflow decorator:

from netra.decorators import workflow

@workflow(name="content-pipeline")
def create_article(topic: str, config_name: str = "default", config: dict = None):
    """Run the content creation pipeline with full tracing."""

    # Set custom attributes for filtering and analysis
    Netra.set_custom_attributes(key="topic", value=topic)
    Netra.set_custom_attributes(key="config_name", value=config_name)

    # Run the crew
    result = run_content_crew(topic, config)

    return {
        "topic": topic,
        "config": config_name,
        "output": result.raw,
    }

Adding Custom Span Attributes

Track additional metadata for each pipeline run:

from netra import Netra, SpanType

@workflow(name="content-pipeline-detailed")
def create_article_detailed(topic: str, config_name: str, config: dict):
    """Run pipeline with detailed custom tracing."""

    with Netra.start_span("pipeline-setup") as setup_span:
        setup_span.set_attribute("topic", topic)
        setup_span.set_attribute("config_name", config_name)
        setup_span.set_attribute("model.researcher", config["researcher"])
        setup_span.set_attribute("model.writer", config["writer"])
        setup_span.set_attribute("model.editor", config["editor"])
        setup_span.set_attribute("model.seo", config["seo"])

        agents = create_agents(config)
        tasks = create_tasks(agents, topic)

    with Netra.start_span("pipeline-execution", as_type=SpanType.AGENT) as exec_span:
        crew = Crew(agents=list(agents.values()), tasks=tasks, process=Process.sequential)
        result = crew.kickoff()
        exec_span.set_attribute("output_length", len(result.raw))

    return {"topic": topic, "config": config_name, "output": result.raw}

Viewing Traces in Netra

After running the pipeline, navigate to Observability → Traces in Netra.

What the Trace Shows

Netra trace view showing multi-agent pipeline

The trace shows:

Pipeline span: Overall execution time
Agent spans: Each agent’s task execution
LLM calls: Nested under each agent with prompts and completions
Token usage: Per-agent and total

Running Configuration Experiments

Test different model configurations to find the optimal cost/quality balance.

Define Configurations

CONFIGS = {
    "premium": {
        "researcher": "gpt-4o",
        "writer": "gpt-4o",
        "editor": "gpt-4o",
        "seo": "gpt-4o",
    },
    "budget": {
        "researcher": "gpt-4o",
        "writer": "gpt-4o",
        "editor": "gpt-3.5-turbo",
        "seo": "gpt-3.5-turbo",
    },
    "economy": {
        "researcher": "gpt-4o",
        "writer": "gpt-3.5-turbo",
        "editor": "gpt-3.5-turbo",
        "seo": "gpt-3.5-turbo",
    },
}

Run Experiments

# Test each configuration
for config_name, config in CONFIGS.items():
    print(f"Running {config_name} configuration...")

    result = create_article(
        topic="The Future of AI in Healthcare",
        config_name=config_name,
        config=config,
    )

    print(f"{config_name}: {len(result['output'])} characters")

Compare in Dashboard

After running all configurations, compare costs and latency:

Config	Total Cost	Total Latency	Output Quality
Premium	~$0.19	~45s	Highest
Budget	~$0.145	~40s	Good
Economy	~$0.085	~35s	Acceptable

Debugging Multi-Agent Issues

Common Problems and Solutions

Problem	What to Look For	Solution
Slow pipeline	High latency on one agent	Use faster model or shorter prompts
Context lost between agents	Missing info in task outputs	Improve task descriptions
Editor making no changes	Low edit delta	Improve editor prompts
High total cost	One agent dominating	Downgrade non-critical agents

Using Traces to Debug

Find slow agents: Sort spans by duration
Trace context flow: Check task outputs passed between agents
Identify cost drivers: Filter by token usage
Compare successful vs failed: Look for pattern differences

Summary

You’ve learned how to add comprehensive observability to CrewAI pipelines:

Auto-instrumentation captures agent execution with minimal code
Per-agent tracing reveals costs, latency, and token usage
Custom attributes enable filtering by topic, config, and more
Configuration experiments find the optimal cost/quality balance

Key Takeaways

Multi-agent systems need per-agent visibility to identify bottlenecks
Cost allocation by role reveals which agents benefit from premium models
Trace context flow to debug handoff issues
Use configuration experiments for data-driven model selection

Custom Evaluator Patterns

Add quality evaluation to your multi-agent pipeline

CrewAI Integration

Complete CrewAI instrumentation guide

Agents Documentation

Deep dive into agent observability features

Usage APIs

Query cost and usage data programmatically

Observability

Evaluation

Open in Google Colab

What You’ll Learn

Trace Agent Handoffs

Track Per-Agent Costs

Debug Multi-Agent Flows

Compare Configurations

Why Trace Multi-Agent Systems?

CrewAI Architecture

Building an Example Pipeline

Installation

Environment Setup

Define the Agents

Define the Tasks

Create the Crew

Adding Netra Observability

Initialize Netra with Auto-Instrumentation

Using the Workflow Decorator

Adding Custom Span Attributes

Viewing Traces in Netra

What the Trace Shows

Running Configuration Experiments

Define Configurations

Run Experiments

Compare in Dashboard

Debugging Multi-Agent Issues

Common Problems and Solutions

Using Traces to Debug

Summary

Key Takeaways

See Also

Custom Evaluator Patterns

CrewAI Integration

Agents Documentation

Usage APIs

Observability

Evaluation

Open in Google Colab

​What You’ll Learn

Trace Agent Handoffs

Track Per-Agent Costs

Debug Multi-Agent Flows

Compare Configurations

​Why Trace Multi-Agent Systems?

​CrewAI Architecture

​Building an Example Pipeline

​Installation

​Environment Setup

​Define the Agents

​Define the Tasks

​Create the Crew

​Adding Netra Observability

​Initialize Netra with Auto-Instrumentation

​Using the Workflow Decorator

​Adding Custom Span Attributes

​Viewing Traces in Netra

​What the Trace Shows

​Running Configuration Experiments

​Define Configurations

​Run Experiments

​Compare in Dashboard

​Debugging Multi-Agent Issues

​Common Problems and Solutions

​Using Traces to Debug

​Summary

​Key Takeaways

​See Also

Custom Evaluator Patterns

CrewAI Integration

Agents Documentation

Usage APIs

What You’ll Learn

Why Trace Multi-Agent Systems?

CrewAI Architecture

Building an Example Pipeline

Installation

Environment Setup

Define the Agents

Define the Tasks

Create the Crew

Adding Netra Observability

Initialize Netra with Auto-Instrumentation

Using the Workflow Decorator

Adding Custom Span Attributes

Viewing Traces in Netra

What the Trace Shows

Running Configuration Experiments

Define Configurations

Run Experiments

Compare in Dashboard

Debugging Multi-Agent Issues

Common Problems and Solutions

Using Traces to Debug

Summary

Key Takeaways

See Also