> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getnetra.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tracing a RAG Pipeline

> Add full observability to a RAG pipeline with Netra. Auto-instrument retrieval, embedding, and generation steps to trace every query end-to-end.

This cookbook walks you through adding **full observability** to a Retrieval-Augmented Generation (RAG) pipeline—tracing every stage from document ingestion to answer generation, tracking costs, and monitoring performance.

<Card title="Open in Google Colab" icon="google" href="https://colab.research.google.com/github/KeyValueSoftwareSystems/netra-cookbooks/blob/master/Tracing_RAG_Pipeline.ipynb">
  Run the complete observability notebook in your browser
</Card>

## What You'll Learn

<CardGroup cols={2}>
  <Card title="1. Build the RAG Pipeline" icon="hammer" href="#creating-the-chat-agent">
    Create a complete RAG chatbot that loads PDFs, chunks documents, generates embeddings, and retrieves relevant context for answering questions.
  </Card>

  <Card title="2. Add Auto-Instrumentation" icon="diagram-project" href="#tracing-the-agent">
    Instrument every stage—chunking, embedding, retrieval, and generation—with Netra auto-tracing to capture the full execution flow.
  </Card>

  <Card title="3. Track Costs & Performance" icon="chart-line" href="#what-gets-auto-traced">
    Monitor token usage, API costs, and latency at each step to identify bottlenecks and optimize your pipeline.
  </Card>

  <Card title="4. Add User & Session Tracking" icon="users" href="#adding-user-and-session-tracking">
    Track usage per user and session to understand conversation flows and user behavior.
  </Card>
</CardGroup>

<Info>
  **Prerequisites:**

  * Python >=3.10, \<3.14 or Node.js 18+
  * OpenAI API key
  * Netra API key ([Steps mentioned here](https://docs.getnetra.ai/quick-start/Overview))
</Info>

***

## High-Level Concepts

### RAG Architecture

A RAG chatbot works in two phases:

**Ingestion (one-time):**

1. Load and chunk the PDF into smaller text segments
2. Generate embeddings for each chunk
3. Store embeddings in a vector database

**Query (per question):**

1. Convert the user's question to an embedding
2. Find the most similar chunks (retrieval)
3. Pass retrieved chunks + question to an LLM
4. Return the generated answer

<Frame>
  <img src="https://mintcdn.com/netra/NbyXgcNYcWWJWGun/images/rag-netra.png?fit=max&auto=format&n=NbyXgcNYcWWJWGun&q=85&s=8745db95b27a32524e744c74c93319ee" alt="RAG Pipeline Architecture" width="9168" height="5008" data-path="images/rag-netra.png" />
</Frame>

### Why Observability Matters for RAG

RAG systems can fail silently in multiple ways:

| Problem         | Symptom            | What Tracing Reveals                |
| --------------- | ------------------ | ----------------------------------- |
| Poor chunking   | Incomplete answers | Chunk sizes, content boundaries     |
| Wrong retrieval | Irrelevant answers | Similarity scores, retrieved chunks |
| Hallucination   | Fabricated info    | Context vs. generated content       |
| High costs      | Budget overruns    | Token usage per stage               |

***

## Creating the Chat Agent

Let's build the RAG chatbot first, then add tracing.

### Installation

Start by installing the required packages. We'll use OpenAI for embeddings and generation, ChromaDB as our vector store, and a PDF parsing library.

<CodeGroup>
  ```bash Python theme={null}
  pip install netra-sdk openai chromadb pypdf reportlab
  ```

  ```bash TypeScript theme={null}
  npm install netra-sdk openai chromadb pdf-parse @chroma-core/default-embed
  ```
</CodeGroup>

### Environment Setup

Configure your API keys. You'll need both an OpenAI key for the LLM operations and a Netra key for observability.

<CodeGroup>
  ```bash Python theme={null}
  export NETRA_API_KEY="your-netra-api-key"
  export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint"
  export OPENAI_API_KEY="your-openai-api-key"
  ```

  ```bash typescript theme={null}
  export NETRA_API_KEY="your-netra-api-key"
  export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint"
  export OPENAI_API_KEY="your-openai-api-key"

  chroma run --path /tmp/chroma_db # The typescript chroma client does not have an in memory mode.
  ```
</CodeGroup>

### Loading and Chunking Documents

The first step in any RAG pipeline is extracting text from your documents and splitting it into manageable chunks. We use overlapping chunks to ensure context isn't lost at chunk boundaries—this helps when relevant information spans multiple segments.

<CodeGroup>
  ```python Python theme={null}
  # Import required libraries
  from pypdf import PdfReader
  from typing import List, Dict, Optional
  import chromadb
  from openai import OpenAI
  import uuid

  # Initialize clients
  openai_client = OpenAI()
  chroma_client = chromadb.Client()

  def load_pdf(file_path: str) -> str:
      """Extract text from a PDF file."""
      reader = PdfReader(file_path)
      text = ""
      for page in reader.pages:
          text += page.extract_text() + "\n"
      return text

  def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 200) -> List[str]:
      """Split text into overlapping chunks."""
      chunks = []
      start = 0
      while start < len(text):
          end = start + chunk_size
          chunk = text[start:end]
          chunks.append(chunk)
          start = end - overlap
      return chunks
  ```

  ```typescript TypeScript theme={null}
  // Import required libraries
  import { PDFParse } from "pdf-parse";
  import * as fs from "fs";
  import { ChromaClient } from "chromadb";
  import OpenAI from "openai";
  import { v4 as uuid } from "uuid";

  // Initialize clients
  const openai = new OpenAI();
  const chroma = new ChromaClient();

  async function loadPdf(filePath: string): Promise<string> {
    const dataBuffer = fs.readFileSync(filePath);
    const parser = new PDFParse({
      data: dataBuffer,
    });
    const data = await parser.getText();
    return data.text;
  }

  function chunkText(text: string, chunkSize = 1000, overlap = 200): string[] {
    const chunks: string[] = [];
    let start = 0;
    while (start < text.length) {
      const end = start + chunkSize;
      const chunk = text.slice(start, end);
      chunks.push(chunk);
      start = end - overlap;
    }
    return chunks;
  }
  ```
</CodeGroup>

### Generating Embeddings and Indexing

Next, we convert each chunk into a vector embedding and store it in ChromaDB. These embeddings capture the semantic meaning of each chunk, allowing us to find relevant content based on meaning rather than just keywords.

<CodeGroup>
  ```python Python theme={null}
  def generate_embeddings(texts: List[str]) -> List[List[float]]:
      """Generate embeddings for a list of texts."""
      response = openai_client.embeddings.create(
          model="text-embedding-3-small",
          input=texts
      )
      return [item.embedding for item in response.data]

  # Load and chunk the PDF
  pdf_text = load_pdf("document.pdf")
  chunks = chunk_text(pdf_text, chunk_size=1000, overlap=200)
  print(f"Created {len(chunks)} chunks")

  # Generate embeddings and store in ChromaDB
  collection = chroma_client.create_collection(name="pdf_qa")
  embeddings = generate_embeddings(chunks)
  collection.add(
      documents=chunks,
      embeddings=embeddings,
      ids=[f"chunk_{i}" for i in range(len(chunks))]
  )
  print(f"Stored {len(chunks)} chunks in vector database")
  ```

  ```typescript TypeScript theme={null}
  function generateEmbeddings(texts: string[]): Promise<number[][]> {
    return openai.embeddings.create({
      model: "text-embedding-3-small",
      input: texts,
    }).then(response => response.data.map((item) => item.embedding));
  }

  // Load and chunk the PDF, then store embeddings
  (async () => {
    const pdfText = await loadPdf("document.pdf");
    const chunks = chunkText(pdfText, 1000, 200);
    console.log(`Created ${chunks.length} chunks`);

    // Generate embeddings and store in ChromaDB
    const collection = await chroma.createCollection({ name: "pdf_qa" });
    const embeddings = await generateEmbeddings(chunks);
    await collection.add({
      documents: chunks,
      embeddings: embeddings,
      ids: chunks.map((_, i) => `chunk_${i}`),
    });
    console.log(`Stored ${chunks.length} chunks in vector database`);
  })();
  ```
</CodeGroup>

### Building the Query Pipeline

Now we implement the core RAG logic: given a user question, retrieve the most relevant chunks from our vector store, then pass them as context to the LLM to generate an answer. The `top_k` parameter controls how many chunks we retrieve—more chunks provide more context but also increase cost and latency.

<CodeGroup>
  ```python Python theme={null}
  def retrieve_chunks(query: str, top_k: int = 3) -> List[Dict]:
      """Retrieve the most relevant chunks for a query."""
      query_embedding = generate_embeddings([query])[0]
      results = collection.query(
          query_embeddings=[query_embedding],
          n_results=top_k,
          include=["documents", "distances"]
      )

      retrieved = []
      for i, doc in enumerate(results["documents"][0]):
          retrieved.append({
              "content": doc,
              "similarity_score": 1 - results["distances"][0][i]  # Convert distance to similarity
          })
      return retrieved

  def generate_answer(query: str, context_chunks: List[Dict]) -> str:
      """Generate an answer using the retrieved context."""
      context = "\n\n".join([chunk["content"] for chunk in context_chunks])

      response = openai_client.chat.completions.create(
          model="gpt-4o-mini",
          messages=[
              {
                  "role": "system",
                  "content": """You are a helpful assistant that answers questions based on the provided context.
                  Only use information from the context to answer. If the answer is not in the context, say so."""
              },
              {
                  "role": "user",
                  "content": f"Context:\n{context}\n\nQuestion: {query}"
              }
          ]
      )
      return response.choices[0].message.content

  # Test the query pipeline
  test_query = "What is the main topic of this document?"
  retrieved_chunks = retrieve_chunks(test_query, top_k=3)
  answer = generate_answer(test_query, retrieved_chunks)
  print(f"Answer: {answer}")
  ```

  ```typescript TypeScript theme={null}
  interface RetrievedChunk {
    content: string;
    similarityScore: number;
  }

  // Note: collection variable needs to be accessible from previous code block
  async function retrieveChunks(
    coll: any,
    query: string,
    topK = 3
  ): Promise<RetrievedChunk[]> {
    const queryEmbedding = (await generateEmbeddings([query]))[0];
    const results = await coll.query({
      queryEmbeddings: [queryEmbedding],
      nResults: topK,
      include: ["documents", "distances"],
    });

    const retrieved: RetrievedChunk[] = [];
    for (let i = 0; i < results.documents[0].length; i++) {
      retrieved.push({
        content: results.documents[0][i] as string,
        similarityScore: 1 - (results.distances[0][i] as number),
      });
    }
    return retrieved;
  }

  async function generateAnswer(
    query: string,
    contextChunks: RetrievedChunk[]
  ): Promise<string> {
    const context = contextChunks.map((chunk) => chunk.content).join("\n\n");

    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        {
          role: "system",
          content: `You are a helpful assistant that answers questions based on the provided context.
          Only use information from the context to answer. If the answer is not in the context, say so.`,
        },
        {
          role: "user",
          content: `Context:\n${context}\n\nQuestion: ${query}`,
        },
      ],
    });
    return response.choices[0]?.message.content || "";
  }

  // Test the query pipeline (requires collection from previous step)
  (async () => {
    // Assume collection is available from previous code block
    const collection = await chroma.getCollection({ name: "pdf_qa" });
    const testQuery = "What is the main topic of this document?";
    const retrievedChunks = await retrieveChunks(collection, testQuery, 3);
    const answer = await generateAnswer(testQuery, retrievedChunks);
    console.log(`Answer: ${answer}`);
  })();
  ```
</CodeGroup>

### Adding Session Support

For production use, we wrap everything in a class that maintains conversation history and session state. This enables multi-turn conversations where the chatbot remembers previous exchanges, and allows us to track usage per user and session.

<CodeGroup>
  ```python Python theme={null}
  class PDFChatbot:
      def __init__(self, pdf_path: str):
          self.pdf_path = pdf_path
          self.conversation_history = []
          self.session_id = str(uuid.uuid4())
          self._setup_vector_store()

      def _setup_vector_store(self):
          """Initialize the vector store with PDF content."""
          pdf_text = load_pdf(self.pdf_path)
          self.chunks = chunk_text(pdf_text)
          embeddings = generate_embeddings(self.chunks)

          self.collection = chroma_client.create_collection(
              name=f"pdf_{self.session_id}"
          )
          self.collection.add(
              documents=self.chunks,
              embeddings=embeddings,
              ids=[f"chunk_{i}" for i in range(len(self.chunks))]
          )

      def chat(self, query: str, user_id: Optional[str] = None) -> Dict:
          """Process a chat message and return the response."""
          # Retrieve relevant chunks
          retrieved = self._retrieve(query)

          # Build conversation context
          context = "\n\n".join([chunk["content"] for chunk in retrieved])

          # Generate response
          messages = [
              {
                  "role": "system",
                  "content": f"""You are a helpful assistant answering questions about a PDF document.
                  Use the following context to answer questions. If the answer is not in the context, say so.

                  Context:
                  {context}"""
              }
          ]

          # Add conversation history
          for msg in self.conversation_history[-6:]:  # Last 3 exchanges
              messages.append(msg)

          messages.append({"role": "user", "content": query})

          response = openai_client.chat.completions.create(
              model="gpt-4o-mini",
              messages=messages
          )

          answer = response.choices[0].message.content

          # Update conversation history
          self.conversation_history.append({"role": "user", "content": query})
          self.conversation_history.append({"role": "assistant", "content": answer})

          return {
              "query": query,
              "answer": answer,
              "retrieved_chunks": retrieved,
              "session_id": self.session_id,
              "user_id": user_id,
              "token_usage": {
                  "prompt_tokens": response.usage.prompt_tokens,
                  "completion_tokens": response.usage.completion_tokens,
                  "total_tokens": response.usage.total_tokens
              }
          }

      def _retrieve(self, query: str, top_k: int = 3) -> List[Dict]:
          """Retrieve relevant chunks."""
          query_embedding = generate_embeddings([query])[0]
          results = self.collection.query(
              query_embeddings=[query_embedding],
              n_results=top_k,
              include=["documents", "distances"]
          )

          retrieved = []
          for i, doc in enumerate(results["documents"][0]):
              retrieved.append({
                  "content": doc,
                  "similarity_score": 1 - results["distances"][0][i]
              })
          return retrieved

  # Usage
  chatbot = PDFChatbot("document.pdf")
  response = chatbot.chat("What is the main topic?", user_id="user-123")
  print(response["answer"])
  ```

  ```typescript TypeScript theme={null}
  interface ChatMessage {
    role: "user" | "assistant" | "system";
    content: string;
  }

  interface ChatResponse {
    query: string;
    answer: string;
    retrievedChunks: RetrievedChunk[];
    sessionId: string;
    userId?: string;
    tokenUsage: {
      promptTokens: number;
      completionTokens: number;
      totalTokens: number;
    };
  }

  class PDFChatbot {
    pdfPath: string;
    conversationHistory: ChatMessage[] = [];
    sessionId: string;
    collection: any;
    chunks: string[] = [];

    constructor(pdfPath: string) {
      this.pdfPath = pdfPath;
      this.sessionId = uuid();
    }

    async initialize() {
      const pdfText = await loadPdf(this.pdfPath);
      this.chunks = chunkText(pdfText);
      const embeddings = await generateEmbeddings(this.chunks);

      this.collection = await chroma.createCollection({
        name: `pdf_${this.sessionId}`,
      });
      await this.collection.add({
        documents: this.chunks,
        embeddings: embeddings,
        ids: this.chunks.map((_, i) => `chunk_${i}`),
      });
    }

    async chat(query: string, userId?: string): Promise<ChatResponse> {
      // Retrieve relevant chunks
      const retrieved = await this.retrieve(query);

      // Build conversation context
      const context = retrieved.map((chunk) => chunk.content).join("\n\n");

      // Build messages
      const messages: ChatMessage[] = [
        {
          role: "system",
          content: `You are a helpful assistant answering questions about a PDF document.
          Use the following context to answer questions. If the answer is not in the context, say so.

          Context:
          ${context}`,
        },
      ];

      // Add conversation history (last 3 exchanges)
      messages.push(...this.conversationHistory.slice(-6));
      messages.push({ role: "user", content: query });

      const response = await openai.chat.completions.create({
        model: "gpt-4o-mini",
        messages: messages,
      });

      const answer = response.choices[0].message.content || "";

      // Update conversation history
      this.conversationHistory.push({ role: "user", content: query });
      this.conversationHistory.push({ role: "assistant", content: answer });

      return {
        query,
        answer,
        retrievedChunks: retrieved,
        sessionId: this.sessionId,
        userId,
        tokenUsage: {
          promptTokens: response.usage?.prompt_tokens || 0,
          completionTokens: response.usage?.completion_tokens || 0,
          totalTokens: response.usage?.total_tokens || 0,
        },
      };
    }

    async retrieve(query: string, topK = 3): Promise<RetrievedChunk[]> {
      const queryEmbedding = (await generateEmbeddings([query]))[0];
      const results = await this.collection.query({
        queryEmbeddings: [queryEmbedding],
        nResults: topK,
        include: ["documents", "distances"],
      });

      const retrieved: RetrievedChunk[] = [];
      for (let i = 0; i < results.documents[0].length; i++) {
        retrieved.push({
          content: results.documents[0][i] as string,
          similarityScore: 1 - (results.distances[0][i] as number),
        });
      }
      return retrieved;
    }
  }

  // Usage
  (async () => {
    const chatbot = new PDFChatbot("document.pdf");
    await chatbot.initialize();
    const response = await chatbot.chat("What is the main topic?", "user-123");
    console.log(response.answer);
  })();
  ```
</CodeGroup>

***

## Tracing the Agent

Now let's add Netra observability to see what's happening inside the RAG pipeline. The good news: with auto-instrumentation, you get full visibility with minimal code changes.

### Initializing Netra

Add these imports and initialization at the very top of your script, before any other code. Auto-instrumentation captures all OpenAI and ChromaDB operations automatically—no decorators or manual spans required.

<CodeGroup>
  ```python Python theme={null}
  # Add these imports at the top, before other imports
  from netra import Netra
  from netra.instrumentation.instruments import InstrumentSet

  # Initialize Netra before any other code
  Netra.init(
      app_name="pdf-qa-chatbot",
      environment="development",
      trace_content=True,
      instruments={
          InstrumentSet.OPENAI,
          InstrumentSet.CHROMA,
      }
  )

  # Now continue with the rest of your imports and code from earlier sections
  # from pypdf import PdfReader
  # from typing import List, Dict, Optional
  # ...
  ```

  ```typescript TypeScript theme={null}
  // Add these imports at the top, before other imports
  import { Netra, NetraInstruments } from "netra-sdk";

  // Initialize Netra before any other code
  // Use an immediately-invoked async function to handle the await
  Netra.init({
    appName: "pdf-qa-chatbot",
    environment: "development",
    traceContent: true,
    instruments: new Set([NetraInstruments.OPENAI, NetraInstruments.CHROMADB]),
  });

  // Now continue with the rest of your imports and code from earlier sections
  // import {PDFParse} from "pdf-parse";
  // import * as fs from "fs";
  // ...
  ```
</CodeGroup>

<Tip>
  **What gets auto-traced with zero code changes:**

  * OpenAI chat completions with model, tokens, cost, and latency
  * OpenAI embeddings with token counts
  * ChromaDB queries and inserts with timing
  * Full prompts and responses (when `trace_content=True`)
</Tip>

### What Gets Auto-Traced

With the initialization above, your existing code from the [Creating the Chat agent](#creating-the-chat-agent) section is automatically traced. Here's what appears in your Netra dashboard:

#### Document Ingestion

The `generate_embeddings()` call to OpenAI and `collection.add()` to ChromaDB are captured automatically.

<Frame caption="Auto-traced document ingestion showing embedding generation and vector storage">
  <img src="https://mintcdn.com/netra/GQvrgvWYERbFTwCQ/images/rag-embedding.png?fit=max&auto=format&n=GQvrgvWYERbFTwCQ&q=85&s=d6546b8d03585b68a2be713597ccdfa8" alt="Ingestion trace showing OpenAI embeddings and ChromaDB operations" width="2408" height="1436" data-path="images/rag-embedding.png" />
</Frame>

#### Retrieval Operations

Query embedding generation and vector search operations appear as child spans with timing and metadata.

<Frame caption="Auto-traced retrieval showing query embedding and ChromaDB vector search">
  <img src="https://mintcdn.com/netra/GQvrgvWYERbFTwCQ/images/rag-retrieval.png?fit=max&auto=format&n=GQvrgvWYERbFTwCQ&q=85&s=f3c6e793529572141d0d657d85b7564a" alt="Retrieval trace showing embedding and search spans" width="2412" height="1442" data-path="images/rag-retrieval.png" />
</Frame>

#### LLM Generation

OpenAI chat completions are fully traced with model, tokens, cost, latency, and full prompt/response content.

<Frame caption="Auto-traced LLM generation showing tokens, cost, and full prompt/response">
  <img src="https://mintcdn.com/netra/GQvrgvWYERbFTwCQ/images/rag-chat.png?fit=max&auto=format&n=GQvrgvWYERbFTwCQ&q=85&s=1955b80bf6bdf03471f07f9f66bb3d03" alt="Generation trace showing OpenAI chat completion details" width="2414" height="1706" data-path="images/rag-chat.png" />
</Frame>

### Adding User and Session Tracking

To analyze usage per user and track conversation flows, add user and session context to your existing PDFChatbot class. This is the one piece that requires explicit code—everything else is auto-traced. Simply add these two lines in your `chat` method:

<CodeGroup>
  ```python Python theme={null}
  # Modify the chat method in your existing PDFChatbot class:
  def chat(self, query: str, user_id: Optional[str] = None) -> Dict:
      """Process a chat message and return the response."""
      # Add these two lines to enable user and session tracking
      Netra.set_session_id(self.session_id)
      if user_id:
          Netra.set_user_id(user_id)

      # Rest of the method remains the same
      retrieved = self._retrieve(query)
      context = "\n\n".join([chunk["content"] for chunk in retrieved])
      # ... (rest of your existing code)
  ```

  ```typescript TypeScript theme={null}
  // Modify the chat method in your existing PDFChatbot class:
  async function chat(query: string, userId?: string): Promise<ChatResponse> {
    // Add these two lines to enable user and session tracking
    Netra.setSessionId(this.sessionId);
    if (userId) {
      Netra.setUserId(userId);
    }

    // Rest of the method remains the same
    const retrieved = await this.retrieve(query);
    const context = retrieved.map((chunk) => chunk.content).join("\n\n");
    // ... (rest of your existing code)
  }
  ```
</CodeGroup>

### What You'll See in the Dashboard

After running the chatbot, you'll see traces in the Netra dashboard with:

* **OpenAI spans** showing model, tokens, cost, and full prompt/response
* **ChromaDB spans** showing query timing and results
* **User and session IDs** attached to all spans for filtering

<video autoPlay muted loop playsInline className="w-full aspect-video rounded-xl" src="https://mintcdn.com/netra/_CSA9kqNsbhvWmxQ/videos/traces-rag-pdf.mp4?fit=max&auto=format&n=_CSA9kqNsbhvWmxQ&q=85&s=5dcffbf6f69e04c672618927aa28d157" data-path="videos/traces-rag-pdf.mp4" />

### Using Decorators

Auto-instrumentation handles most cases of tracing but if you want to bring in more structure, you can use decorators.
Use decorators to create parent spans that group related operations. This is useful when you want to see a single trace for an entire pipeline rather than individual OpenAI/ChromaDB calls.

| Decorator   | Use Case                                     |
| ----------- | -------------------------------------------- |
| `@workflow` | Top-level pipeline or request handler        |
| `@task`     | Discrete unit of work within a workflow      |
| `@span`     | Fine-grained tracing for specific operations |

<Accordion title="Complete Example with Decorators">
  <CodeGroup>
    ```python Python theme={null}
    import os
    import uuid
    from typing import List, Dict, Optional
    from pypdf import PdfReader
    import chromadb
    from openai import OpenAI

    from netra import Netra
    from netra.decorators import workflow, task, span
    from netra.instrumentation.instruments import InstrumentSet

    # Initialize Netra with auto-instrumentation
    Netra.init(
        app_name="pdf-qa-chatbot",
        environment="development",
        trace_content=True,
        instruments={
            InstrumentSet.OPENAI,
            InstrumentSet.CHROMA,
        }
    )

    # Initialize clients
    openai_client = OpenAI()
    chroma_client = chromadb.Client()


    def generate_embeddings(texts: List[str]) -> List[List[float]]:
        """Generate embeddings for a list of texts."""
        response = openai_client.embeddings.create(
            model="text-embedding-3-small",
            input=texts
        )
        return [item.embedding for item in response.data]


    @task(name="load-pdf")
    def load_pdf(file_path: str) -> str:
        """Extract text from a PDF file."""
        reader = PdfReader(file_path)
        text = ""
        for page in reader.pages:
            text += page.extract_text() + "\n"
        return text


    @task(name="chunk-text")
    def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 200) -> List[str]:
        """Split text into overlapping chunks."""
        chunks = []
        start = 0
        while start < len(text):
            end = start + chunk_size
            chunk = text[start:end]
            chunks.append(chunk)
            start = end - overlap
        return chunks


    class PDFChatbot:
        """A RAG-based chatbot for answering questions about PDF documents."""

        def __init__(self, pdf_path: str):
            self.pdf_path = pdf_path
            self.session_id = str(uuid.uuid4())
            self.collection = None
            self.chunks: List[str] = []
            self.conversation_history: List[Dict] = []

        @task(name="document-ingestion")
        def initialize(self):
            """Initialize the vector store with PDF content."""
            pdf_text = load_pdf(self.pdf_path)
            self.chunks = chunk_text(pdf_text)
            embeddings = generate_embeddings(self.chunks)

            self.collection = chroma_client.create_collection(name=f"pdf_{self.session_id[:8]}")
            self.collection.add(
                documents=self.chunks,
                embeddings=embeddings,
                ids=[f"chunk_{i}" for i in range(len(self.chunks))]
            )

        @workflow(name="pdf-qa-query")
        def chat(self, query: str, user_id: Optional[str] = None) -> Dict:
            """Process a chat message and return the response."""
            Netra.set_session_id(self.session_id)
            if user_id:
                Netra.set_user_id(user_id)

            retrieved = self._retrieve(query)
            answer, response = self._generate_answer(query, retrieved)

            # Update conversation history
            self.conversation_history.append({"role": "user", "content": query})
            self.conversation_history.append({"role": "assistant", "content": answer})

            return {"query": query, "answer": answer, "retrieved_chunks": retrieved}

        @task(name="retrieval")
        def _retrieve(self, query: str, top_k: int = 3) -> List[Dict]:
            """Retrieve relevant chunks."""
            query_embedding = self._get_query_embedding(query)
            retrieved = self._vector_search(query_embedding, top_k)
            return retrieved

        @span(name="query-embedding")
        def _get_query_embedding(self, query: str) -> List[float]:
            """Generate embedding for the query."""
            return generate_embeddings([query])[0]

        @span(name="vector-search")
        def _vector_search(self, query_embedding: List[float], top_k: int) -> List[Dict]:
            """Search vector database for relevant chunks."""
            results = self.collection.query(
                query_embeddings=[query_embedding],
                n_results=top_k,
                include=["documents", "distances"]
            )
            return [{"content": doc, "similarity_score": 1 - results["distances"][0][i]}
                    for i, doc in enumerate(results["documents"][0])]

        @span(name="answer-generation")
        def _generate_answer(self, query: str, retrieved: List[Dict]):
            """Generate answer using retrieved context."""
            context = "\n\n".join([chunk["content"] for chunk in retrieved])
            messages = [
                {"role": "system", "content": f"Use this context to answer: {context}"},
                {"role": "user", "content": query}
            ]
            response = openai_client.chat.completions.create(model="gpt-4o-mini", messages=messages)
            return response.choices[0].message.content, response


    # Usage
    chatbot = PDFChatbot("document.pdf")
    chatbot.initialize()

    response = chatbot.chat("What is the main topic?", user_id="user-123")
    print(response["answer"])

    Netra.shutdown()
    ```

    ```typescript TypeScript theme={null}
    import fs from "fs/promises";
    import pdfParse from "pdf-parse";
    import { ChromaClient } from "chromadb";
    import OpenAI from "openai";

    import { Netra, NetraInstruments, workflow, task, span } from "netra-sdk";

    // Initialize Netra with auto-instrumentation
    await Netra.init({
      appName: "pdf-qa-chatbot",
      environment: "development",
      traceContent: true,
      instruments: new Set([NetraInstruments.OPENAI, NetraInstruments.CHROMA]),
    });

    // Initialize clients
    const openaiClient = new OpenAI();
    const chromaClient = new ChromaClient();


    async function generateEmbeddings(texts: string[]): Promise<number[][]> {
      const response = await openaiClient.embeddings.create({
        model: "text-embedding-3-small",
        input: texts,
      });
      return response.data.map((item) => item.embedding);
    }


    // Task wrapper for loading PDF
    const loadPdf = task("load-pdf", async (filePath: string): Promise<string> => {
      const pdfData = await fs.readFile(filePath);
      const pdf = await pdfParse(pdfData);
      return pdf.text;
    });

    // Task wrapper for chunking text
    const chunkText = task("chunk-text", (
      text: string,
      chunkSize: number = 1000,
      overlap: number = 200
    ): string[] => {
      const chunks: string[] = [];
      let start = 0;
      while (start < text.length) {
        const end = start + chunkSize;
        chunks.push(text.slice(start, end));
        start = end - overlap;
      }
      return chunks;
    });


    class PDFChatbot {
      pdfPath: string;
      sessionId: string;
      collection: any;
      chunks: string[] = [];
      conversationHistory: Array<{ role: string; content: string }> = [];

      constructor(pdfPath: string) {
        this.pdfPath = pdfPath;
        this.sessionId = crypto.randomUUID();
      }

      // Task wrapper for document ingestion
      initialize = task("document-ingestion", async () => {
        const pdfText = await loadPdf(this.pdfPath);
        this.chunks = chunkText(pdfText);
        const embeddings = await generateEmbeddings(this.chunks);

        this.collection = await chromaClient.createCollection({
          name: `pdf_${this.sessionId.slice(0, 8)}`
        });
        await this.collection.add({
          documents: this.chunks,
          embeddings,
          ids: this.chunks.map((_, i) => `chunk_${i}`)
        });
      });

      // Workflow wrapper for chat
      chat = workflow("pdf-qa-query", async (
        query: string,
        userId?: string
      ): Promise<Record<string, any>> => {
        Netra.setSessionId(this.sessionId);
        if (userId) {
          Netra.setUserId(userId);
        }

        const retrieved = await this.retrieve(query);
        const { answer, response } = await this.generateAnswer(query, retrieved);

        // Update conversation history
        this.conversationHistory.push({ role: "user", content: query });
        this.conversationHistory.push({ role: "assistant", content: answer });

        return { query, answer, retrievedChunks: retrieved };
      });

      // Task wrapper for retrieval
      retrieve = task("retrieval", async (
        query: string,
        topK: number = 3
      ): Promise<Array<{ content: string; similarityScore: number }>> => {
        const queryEmbedding = await this.getQueryEmbedding(query);
        return this.vectorSearch(queryEmbedding, topK);
      });

      // Span wrapper for query embedding
      getQueryEmbedding = span("query-embedding", async (
        query: string
      ): Promise<number[]> => {
        const embeddings = await generateEmbeddings([query]);
        return embeddings[0];
      });

      // Span wrapper for vector search
      vectorSearch = span("vector-search", async (
        queryEmbedding: number[],
        topK: number
      ): Promise<Array<{ content: string; similarityScore: number }>> => {
        const results = await this.collection.query({
          queryEmbeddings: [queryEmbedding],
          nResults: topK,
          include: ["documents", "distances"]
        });

        return results.documents[0].map((doc: string, i: number) => ({
          content: doc,
          similarityScore: 1 - results.distances[0][i]
        }));
      });

      // Span wrapper for answer generation
      generateAnswer = span("answer-generation", async (
        query: string,
        retrieved: Array<{ content: string }>
      ) => {
        const context = retrieved.map(chunk => chunk.content).join("\n\n");
        const messages = [
          { role: "system" as const, content: `Use this context to answer: ${context}` },
          { role: "user" as const, content: query }
        ];

        const response = await openaiClient.chat.completions.create({
          model: "gpt-4o-mini",
          messages
        });

        return {
          answer: response.choices[0].message.content,
          response
        };
      });
    }


    // Usage

    (async () => {
      const chatbot = new PDFChatbot("document.pdf");
      await chatbot.initialize();

      const response = await chatbot.chat("What is the main topic?", "user-123");
      console.log(response.answer);
    })();

    ```
  </CodeGroup>
</Accordion>

<Frame caption="Traces with decorators showing hierarchical span structure">
  <img src="https://mintcdn.com/netra/GQvrgvWYERbFTwCQ/images/cookbook_rag_pdf_image_with_decorators.png?fit=max&auto=format&n=GQvrgvWYERbFTwCQ&q=85&s=8784dd2d39c9b9a020dd1fd8f1e9aa17" alt="Decorator traces" width="2416" height="1582" data-path="images/cookbook_rag_pdf_image_with_decorators.png" />
</Frame>

***

## Summary

You've built a fully observable RAG pipeline with Netra. Your chatbot now has:

* **End-to-end tracing** across document ingestion, retrieval, and generation
* **Cost and performance tracking** at each pipeline stage
* **User and session tracking** for usage analytics
* **Debugging capabilities** to trace issues back to specific chunks and prompts

With this foundation, you can identify bottlenecks, optimize costs, and debug issues in your RAG system with confidence.

***

## See Also

<CardGroup cols={2}>
  <Card title="Evaluate Your RAG Pipeline" icon="clipboard-check" href="/Cookbooks/evaluation/evaluating-rag-quality">
    Add quality metrics and test suites to measure retrieval and generation quality
  </Card>

  <Card title="Simulation Testing" icon="flask" href="/Simulation/Simulation-overview">
    Run automated simulation tests to stress-test your pipeline
  </Card>
</CardGroup>
