Open in Google Colab
Run the complete observability notebook in your browser
What You’ll Learn
1. Build the RAG Pipeline
Create a complete RAG chatbot that loads PDFs, chunks documents, generates embeddings, and retrieves relevant context for answering questions.
2. Add Auto-Instrumentation
Instrument every stage—chunking, embedding, retrieval, and generation—with Netra auto-tracing to capture the full execution flow.
3. Track Costs & Performance
Monitor token usage, API costs, and latency at each step to identify bottlenecks and optimize your pipeline.
4. Add User & Session Tracking
Track usage per user and session to understand conversation flows and user behavior.
Prerequisites:
- Python >=3.10, <3.14 or Node.js 18+
- OpenAI API key
- Netra API key (Steps mentioned here)
High-Level Concepts
RAG Architecture
A RAG chatbot works in two phases: Ingestion (one-time):- Load and chunk the PDF into smaller text segments
- Generate embeddings for each chunk
- Store embeddings in a vector database
- Convert the user’s question to an embedding
- Find the most similar chunks (retrieval)
- Pass retrieved chunks + question to an LLM
- Return the generated answer

Why Observability Matters for RAG
RAG systems can fail silently in multiple ways:| Problem | Symptom | What Tracing Reveals |
|---|---|---|
| Poor chunking | Incomplete answers | Chunk sizes, content boundaries |
| Wrong retrieval | Irrelevant answers | Similarity scores, retrieved chunks |
| Hallucination | Fabricated info | Context vs. generated content |
| High costs | Budget overruns | Token usage per stage |
Creating the Chat Agent
Let’s build the RAG chatbot first, then add tracing.Installation
Start by installing the required packages. We’ll use OpenAI for embeddings and generation, ChromaDB as our vector store, and a PDF parsing library.Environment Setup
Configure your API keys. You’ll need both an OpenAI key for the LLM operations and a Netra key for observability.Loading and Chunking Documents
The first step in any RAG pipeline is extracting text from your documents and splitting it into manageable chunks. We use overlapping chunks to ensure context isn’t lost at chunk boundaries—this helps when relevant information spans multiple segments.Generating Embeddings and Indexing
Next, we convert each chunk into a vector embedding and store it in ChromaDB. These embeddings capture the semantic meaning of each chunk, allowing us to find relevant content based on meaning rather than just keywords.Building the Query Pipeline
Now we implement the core RAG logic: given a user question, retrieve the most relevant chunks from our vector store, then pass them as context to the LLM to generate an answer. Thetop_k parameter controls how many chunks we retrieve—more chunks provide more context but also increase cost and latency.
Adding Session Support
For production use, we wrap everything in a class that maintains conversation history and session state. This enables multi-turn conversations where the chatbot remembers previous exchanges, and allows us to track usage per user and session.Tracing the Agent
Now let’s add Netra observability to see what’s happening inside the RAG pipeline. The good news: with auto-instrumentation, you get full visibility with minimal code changes.Initializing Netra
Add these imports and initialization at the very top of your script, before any other code. Auto-instrumentation captures all OpenAI and ChromaDB operations automatically—no decorators or manual spans required.What Gets Auto-Traced
With the initialization above, your existing code from the Creating the Chat agent section is automatically traced. Here’s what appears in your Netra dashboard:Document Ingestion
Thegenerate_embeddings() call to OpenAI and collection.add() to ChromaDB are captured automatically.

Retrieval Operations
Query embedding generation and vector search operations appear as child spans with timing and metadata.
LLM Generation
OpenAI chat completions are fully traced with model, tokens, cost, latency, and full prompt/response content.
Adding User and Session Tracking
To analyze usage per user and track conversation flows, add user and session context to your existing PDFChatbot class. This is the one piece that requires explicit code—everything else is auto-traced. Simply add these two lines in yourchat method:
What You’ll See in the Dashboard
After running the chatbot, you’ll see traces in the Netra dashboard with:- OpenAI spans showing model, tokens, cost, and full prompt/response
- ChromaDB spans showing query timing and results
- User and session IDs attached to all spans for filtering
Using Decorators
Auto-instrumentation handles most cases of tracing but if you want to bring in more structure, you can use decorators. Use decorators to create parent spans that group related operations. This is useful when you want to see a single trace for an entire pipeline rather than individual OpenAI/ChromaDB calls.| Decorator | Use Case |
|---|---|
@workflow | Top-level pipeline or request handler |
@task | Discrete unit of work within a workflow |
@span | Fine-grained tracing for specific operations |
Complete Example with Decorators
Complete Example with Decorators

Summary
You’ve built a fully observable RAG pipeline with Netra. Your chatbot now has:- End-to-end tracing across document ingestion, retrieval, and generation
- Cost and performance tracking at each pipeline stage
- User and session tracking for usage analytics
- Debugging capabilities to trace issues back to specific chunks and prompts