> ## Documentation Index > Fetch the complete documentation index at: https://docs.getnetra.ai/llms.txt > Use this file to discover all available pages before exploring further. # Tracing a RAG Pipeline > Add full observability to a RAG pipeline with Netra. Auto-instrument retrieval, embedding, and generation steps to trace every query end-to-end. This cookbook walks you through adding **full observability** to a Retrieval-Augmented Generation (RAG) pipeline—tracing every stage from document ingestion to answer generation, tracking costs, and monitoring performance. Run the complete observability notebook in your browser ## What You'll Learn Create a complete RAG chatbot that loads PDFs, chunks documents, generates embeddings, and retrieves relevant context for answering questions. Instrument every stage—chunking, embedding, retrieval, and generation—with Netra auto-tracing to capture the full execution flow. Monitor token usage, API costs, and latency at each step to identify bottlenecks and optimize your pipeline. Track usage per user and session to understand conversation flows and user behavior. **Prerequisites:** * Python >=3.10, \<3.14 or Node.js 18+ * OpenAI API key * Netra API key ([Steps mentioned here](https://docs.getnetra.ai/quick-start/Overview)) *** ## High-Level Concepts ### RAG Architecture A RAG chatbot works in two phases: **Ingestion (one-time):** 1. Load and chunk the PDF into smaller text segments 2. Generate embeddings for each chunk 3. Store embeddings in a vector database **Query (per question):** 1. Convert the user's question to an embedding 2. Find the most similar chunks (retrieval) 3. Pass retrieved chunks + question to an LLM 4. Return the generated answer RAG Pipeline Architecture

### Why Observability Matters for RAG RAG systems can fail silently in multiple ways: | Problem | Symptom | What Tracing Reveals | | --------------- | ------------------ | ----------------------------------- | | Poor chunking | Incomplete answers | Chunk sizes, content boundaries | | Wrong retrieval | Irrelevant answers | Similarity scores, retrieved chunks | | Hallucination | Fabricated info | Context vs. generated content | | High costs | Budget overruns | Token usage per stage | *** ## Creating the Chat Agent Let's build the RAG chatbot first, then add tracing. ### Installation Start by installing the required packages. We'll use OpenAI for embeddings and generation, ChromaDB as our vector store, and a PDF parsing library. ```bash Python theme={null} pip install netra-sdk openai chromadb pypdf reportlab ``` ```bash TypeScript theme={null} npm install netra-sdk openai chromadb pdf-parse @chroma-core/default-embed ``` ### Environment Setup Configure your API keys. You'll need both an OpenAI key for the LLM operations and a Netra key for observability. ```bash Python theme={null} export NETRA_API_KEY="your-netra-api-key" export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint" export OPENAI_API_KEY="your-openai-api-key" ``` ```bash typescript theme={null} export NETRA_API_KEY="your-netra-api-key" export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint" export OPENAI_API_KEY="your-openai-api-key" chroma run --path /tmp/chroma_db # The typescript chroma client does not have an in memory mode. ``` ### Loading and Chunking Documents The first step in any RAG pipeline is extracting text from your documents and splitting it into manageable chunks. We use overlapping chunks to ensure context isn't lost at chunk boundaries—this helps when relevant information spans multiple segments. ```python Python theme={null} # Import required libraries from pypdf import PdfReader from typing import List, Dict, Optional import chromadb from openai import OpenAI import uuid # Initialize clients openai_client = OpenAI() chroma_client = chromadb.Client() def load_pdf(file_path: str) -> str: """Extract text from a PDF file.""" reader = PdfReader(file_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 200) -> List[str]: """Split text into overlapping chunks.""" chunks = [] start = 0 while start < len(text): end = start + chunk_size chunk = text[start:end] chunks.append(chunk) start = end - overlap return chunks ``` ```typescript TypeScript theme={null} // Import required libraries import { PDFParse } from "pdf-parse"; import * as fs from "fs"; import { ChromaClient } from "chromadb"; import OpenAI from "openai"; import { v4 as uuid } from "uuid"; // Initialize clients const openai = new OpenAI(); const chroma = new ChromaClient(); async function loadPdf(filePath: string): Promise { const dataBuffer = fs.readFileSync(filePath); const parser = new PDFParse({ data: dataBuffer, }); const data = await parser.getText(); return data.text; } function chunkText(text: string, chunkSize = 1000, overlap = 200): string[] { const chunks: string[] = []; let start = 0; while (start < text.length) { const end = start + chunkSize; const chunk = text.slice(start, end); chunks.push(chunk); start = end - overlap; } return chunks; } ``` ### Generating Embeddings and Indexing Next, we convert each chunk into a vector embedding and store it in ChromaDB. These embeddings capture the semantic meaning of each chunk, allowing us to find relevant content based on meaning rather than just keywords. ```python Python theme={null} def generate_embeddings(texts: List[str]) -> List[List[float]]: """Generate embeddings for a list of texts.""" response = openai_client.embeddings.create( model="text-embedding-3-small", input=texts ) return [item.embedding for item in response.data] # Load and chunk the PDF pdf_text = load_pdf("document.pdf") chunks = chunk_text(pdf_text, chunk_size=1000, overlap=200) print(f"Created {len(chunks)} chunks") # Generate embeddings and store in ChromaDB collection = chroma_client.create_collection(name="pdf_qa") embeddings = generate_embeddings(chunks) collection.add( documents=chunks, embeddings=embeddings, ids=[f"chunk_{i}" for i in range(len(chunks))] ) print(f"Stored {len(chunks)} chunks in vector database") ``` ```typescript TypeScript theme={null} function generateEmbeddings(texts: string[]): Promise { return openai.embeddings.create({ model: "text-embedding-3-small", input: texts, }).then(response => response.data.map((item) => item.embedding)); } // Load and chunk the PDF, then store embeddings (async () => { const pdfText = await loadPdf("document.pdf"); const chunks = chunkText(pdfText, 1000, 200); console.log(`Created ${chunks.length} chunks`); // Generate embeddings and store in ChromaDB const collection = await chroma.createCollection({ name: "pdf_qa" }); const embeddings = await generateEmbeddings(chunks); await collection.add({ documents: chunks, embeddings: embeddings, ids: chunks.map((_, i) => `chunk_${i}`), }); console.log(`Stored ${chunks.length} chunks in vector database`); })(); ``` ### Building the Query Pipeline Now we implement the core RAG logic: given a user question, retrieve the most relevant chunks from our vector store, then pass them as context to the LLM to generate an answer. The `top_k` parameter controls how many chunks we retrieve—more chunks provide more context but also increase cost and latency. ```python Python theme={null} def retrieve_chunks(query: str, top_k: int = 3) -> List[Dict]: """Retrieve the most relevant chunks for a query.""" query_embedding = generate_embeddings([query])[0] results = collection.query( query_embeddings=[query_embedding], n_results=top_k, include=["documents", "distances"] ) retrieved = [] for i, doc in enumerate(results["documents"][0]): retrieved.append({ "content": doc, "similarity_score": 1 - results["distances"][0][i] # Convert distance to similarity }) return retrieved def generate_answer(query: str, context_chunks: List[Dict]) -> str: """Generate an answer using the retrieved context.""" context = "\n\n".join([chunk["content"] for chunk in context_chunks]) response = openai_client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": """You are a helpful assistant that answers questions based on the provided context. Only use information from the context to answer. If the answer is not in the context, say so.""" }, { "role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}" } ] ) return response.choices[0].message.content # Test the query pipeline test_query = "What is the main topic of this document?" retrieved_chunks = retrieve_chunks(test_query, top_k=3) answer = generate_answer(test_query, retrieved_chunks) print(f"Answer: {answer}") ``` ```typescript TypeScript theme={null} interface RetrievedChunk { content: string; similarityScore: number; } // Note: collection variable needs to be accessible from previous code block async function retrieveChunks( coll: any, query: string, topK = 3 ): Promise { const queryEmbedding = (await generateEmbeddings([query]))[0]; const results = await coll.query({ queryEmbeddings: [queryEmbedding], nResults: topK, include: ["documents", "distances"], }); const retrieved: RetrievedChunk[] = []; for (let i = 0; i < results.documents[0].length; i++) { retrieved.push({ content: results.documents[0][i] as string, similarityScore: 1 - (results.distances[0][i] as number), }); } return retrieved; } async function generateAnswer( query: string, contextChunks: RetrievedChunk[] ): Promise { const context = contextChunks.map((chunk) => chunk.content).join("\n\n"); const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: `You are a helpful assistant that answers questions based on the provided context. Only use information from the context to answer. If the answer is not in the context, say so.`, }, { role: "user", content: `Context:\n${context}\n\nQuestion: ${query}`, }, ], }); return response.choices[0]?.message.content || ""; } // Test the query pipeline (requires collection from previous step) (async () => { // Assume collection is available from previous code block const collection = await chroma.getCollection({ name: "pdf_qa" }); const testQuery = "What is the main topic of this document?"; const retrievedChunks = await retrieveChunks(collection, testQuery, 3); const answer = await generateAnswer(testQuery, retrievedChunks); console.log(`Answer: ${answer}`); })(); ``` ### Adding Session Support For production use, we wrap everything in a class that maintains conversation history and session state. This enables multi-turn conversations where the chatbot remembers previous exchanges, and allows us to track usage per user and session. ```python Python theme={null} class PDFChatbot: def __init__(self, pdf_path: str): self.pdf_path = pdf_path self.conversation_history = [] self.session_id = str(uuid.uuid4()) self._setup_vector_store() def _setup_vector_store(self): """Initialize the vector store with PDF content.""" pdf_text = load_pdf(self.pdf_path) self.chunks = chunk_text(pdf_text) embeddings = generate_embeddings(self.chunks) self.collection = chroma_client.create_collection( name=f"pdf_{self.session_id}" ) self.collection.add( documents=self.chunks, embeddings=embeddings, ids=[f"chunk_{i}" for i in range(len(self.chunks))] ) def chat(self, query: str, user_id: Optional[str] = None) -> Dict: """Process a chat message and return the response.""" # Retrieve relevant chunks retrieved = self._retrieve(query) # Build conversation context context = "\n\n".join([chunk["content"] for chunk in retrieved]) # Generate response messages = [ { "role": "system", "content": f"""You are a helpful assistant answering questions about a PDF document. Use the following context to answer questions. If the answer is not in the context, say so. Context: {context}""" } ] # Add conversation history for msg in self.conversation_history[-6:]: # Last 3 exchanges messages.append(msg) messages.append({"role": "user", "content": query}) response = openai_client.chat.completions.create( model="gpt-4o-mini", messages=messages ) answer = response.choices[0].message.content # Update conversation history self.conversation_history.append({"role": "user", "content": query}) self.conversation_history.append({"role": "assistant", "content": answer}) return { "query": query, "answer": answer, "retrieved_chunks": retrieved, "session_id": self.session_id, "user_id": user_id, "token_usage": { "prompt_tokens": response.usage.prompt_tokens, "completion_tokens": response.usage.completion_tokens, "total_tokens": response.usage.total_tokens } } def _retrieve(self, query: str, top_k: int = 3) -> List[Dict]: """Retrieve relevant chunks.""" query_embedding = generate_embeddings([query])[0] results = self.collection.query( query_embeddings=[query_embedding], n_results=top_k, include=["documents", "distances"] ) retrieved = [] for i, doc in enumerate(results["documents"][0]): retrieved.append({ "content": doc, "similarity_score": 1 - results["distances"][0][i] }) return retrieved # Usage chatbot = PDFChatbot("document.pdf") response = chatbot.chat("What is the main topic?", user_id="user-123") print(response["answer"]) ``` ```typescript TypeScript theme={null} interface ChatMessage { role: "user" | "assistant" | "system"; content: string; } interface ChatResponse { query: string; answer: string; retrievedChunks: RetrievedChunk[]; sessionId: string; userId?: string; tokenUsage: { promptTokens: number; completionTokens: number; totalTokens: number; }; } class PDFChatbot { pdfPath: string; conversationHistory: ChatMessage[] = []; sessionId: string; collection: any; chunks: string[] = []; constructor(pdfPath: string) { this.pdfPath = pdfPath; this.sessionId = uuid(); } async initialize() { const pdfText = await loadPdf(this.pdfPath); this.chunks = chunkText(pdfText); const embeddings = await generateEmbeddings(this.chunks); this.collection = await chroma.createCollection({ name: `pdf_${this.sessionId}`, }); await this.collection.add({ documents: this.chunks, embeddings: embeddings, ids: this.chunks.map((_, i) => `chunk_${i}`), }); } async chat(query: string, userId?: string): Promise { // Retrieve relevant chunks const retrieved = await this.retrieve(query); // Build conversation context const context = retrieved.map((chunk) => chunk.content).join("\n\n"); // Build messages const messages: ChatMessage[] = [ { role: "system", content: `You are a helpful assistant answering questions about a PDF document. Use the following context to answer questions. If the answer is not in the context, say so. Context: ${context}`, }, ]; // Add conversation history (last 3 exchanges) messages.push(...this.conversationHistory.slice(-6)); messages.push({ role: "user", content: query }); const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: messages, }); const answer = response.choices[0].message.content || ""; // Update conversation history this.conversationHistory.push({ role: "user", content: query }); this.conversationHistory.push({ role: "assistant", content: answer }); return { query, answer, retrievedChunks: retrieved, sessionId: this.sessionId, userId, tokenUsage: { promptTokens: response.usage?.prompt_tokens || 0, completionTokens: response.usage?.completion_tokens || 0, totalTokens: response.usage?.total_tokens || 0, }, }; } async retrieve(query: string, topK = 3): Promise { const queryEmbedding = (await generateEmbeddings([query]))[0]; const results = await this.collection.query({ queryEmbeddings: [queryEmbedding], nResults: topK, include: ["documents", "distances"], }); const retrieved: RetrievedChunk[] = []; for (let i = 0; i < results.documents[0].length; i++) { retrieved.push({ content: results.documents[0][i] as string, similarityScore: 1 - (results.distances[0][i] as number), }); } return retrieved; } } // Usage (async () => { const chatbot = new PDFChatbot("document.pdf"); await chatbot.initialize(); const response = await chatbot.chat("What is the main topic?", "user-123"); console.log(response.answer); })(); ``` *** ## Tracing the Agent Now let's add Netra observability to see what's happening inside the RAG pipeline. The good news: with auto-instrumentation, you get full visibility with minimal code changes. ### Initializing Netra Add these imports and initialization at the very top of your script, before any other code. Auto-instrumentation captures all OpenAI and ChromaDB operations automatically—no decorators or manual spans required. ```python Python theme={null} # Add these imports at the top, before other imports from netra import Netra from netra.instrumentation.instruments import InstrumentSet # Initialize Netra before any other code Netra.init( app_name="pdf-qa-chatbot", environment="development", trace_content=True, instruments={ InstrumentSet.OPENAI, InstrumentSet.CHROMA, } ) # Now continue with the rest of your imports and code from earlier sections # from pypdf import PdfReader # from typing import List, Dict, Optional # ... ``` ```typescript TypeScript theme={null} // Add these imports at the top, before other imports import { Netra, NetraInstruments } from "netra-sdk"; // Initialize Netra before any other code // Use an immediately-invoked async function to handle the await Netra.init({ appName: "pdf-qa-chatbot", environment: "development", traceContent: true, instruments: new Set([NetraInstruments.OPENAI, NetraInstruments.CHROMADB]), }); // Now continue with the rest of your imports and code from earlier sections // import {PDFParse} from "pdf-parse"; // import * as fs from "fs"; // ... ``` **What gets auto-traced with zero code changes:** * OpenAI chat completions with model, tokens, cost, and latency * OpenAI embeddings with token counts * ChromaDB queries and inserts with timing * Full prompts and responses (when `trace_content=True`) ### What Gets Auto-Traced With the initialization above, your existing code from the [Creating the Chat agent](#creating-the-chat-agent) section is automatically traced. Here's what appears in your Netra dashboard: #### Document Ingestion The `generate_embeddings()` call to OpenAI and `collection.add()` to ChromaDB are captured automatically. Ingestion trace showing OpenAI embeddings and ChromaDB operations

Ingestion trace showing OpenAI embeddings and ChromaDB operations

#### Retrieval Operations Query embedding generation and vector search operations appear as child spans with timing and metadata. Retrieval trace showing embedding and search spans

Retrieval trace showing embedding and search spans

#### LLM Generation OpenAI chat completions are fully traced with model, tokens, cost, latency, and full prompt/response content. Generation trace showing OpenAI chat completion details

Generation trace showing OpenAI chat completion details

### Adding User and Session Tracking To analyze usage per user and track conversation flows, add user and session context to your existing PDFChatbot class. This is the one piece that requires explicit code—everything else is auto-traced. Simply add these two lines in your `chat` method: ```python Python theme={null} # Modify the chat method in your existing PDFChatbot class: def chat(self, query: str, user_id: Optional[str] = None) -> Dict: """Process a chat message and return the response.""" # Add these two lines to enable user and session tracking Netra.set_session_id(self.session_id) if user_id: Netra.set_user_id(user_id) # Rest of the method remains the same retrieved = self._retrieve(query) context = "\n\n".join([chunk["content"] for chunk in retrieved]) # ... (rest of your existing code) ``` ```typescript TypeScript theme={null} // Modify the chat method in your existing PDFChatbot class: async function chat(query: string, userId?: string): Promise { // Add these two lines to enable user and session tracking Netra.setSessionId(this.sessionId); if (userId) { Netra.setUserId(userId); } // Rest of the method remains the same const retrieved = await this.retrieve(query); const context = retrieved.map((chunk) => chunk.content).join("\n\n"); // ... (rest of your existing code) } ``` ### What You'll See in the Dashboard After running the chatbot, you'll see traces in the Netra dashboard with: * **OpenAI spans** showing model, tokens, cost, and full prompt/response * **ChromaDB spans** showing query timing and results * **User and session IDs** attached to all spans for filtering