Observability for a Multi-Tenant AI Summarization Platform - What is Netra?

This cookbook walks you through building complete multi-tenant observability for a B2B AI platform—tracking costs per customer, monitoring SLAs, and evaluating quality across different tenant tiers.

Open in Google Colab

Run the complete notebook in your browser

All company names (MeetingMind, Apex Legal, Stratex Consulting, TechStart Inc) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.

What You’ll Learn

This cookbook guides you through 5 key stages of building observable multi-tenant AI applications:

1. Build Multi-Tenant Agent

Create a meeting summarization agent that routes requests to different models based on customer tier.

2. Add Tenant-Level Tracing

Instrument your agent with Netra’s native tenant tracking to capture per-customer metrics.

3. Track Costs & Usage

Query per-tenant usage via API or view cost breakdowns directly in the Netra dashboard.

4. Monitor SLAs

Set up alerts that trigger when specific customers breach latency or error rate thresholds.

5. Evaluate Quality per Tier

Run systematic evaluations to ensure each pricing tier delivers acceptable quality.

Prerequisites

Python 3.9+ or Node.js 18+
OpenAI API key
Netra API key (Get your key here)

High-Level Concepts

Why Multi-Tenant Observability Matters

For B2B AI platforms serving multiple customers, generic observability isn’t enough. You need to answer questions like:

Question	Who Asks	What You Need
”How much did Customer X use last month?”	Finance	Per-tenant cost attribution
”Is our Enterprise tier meeting SLAs?”	Customer Success	Tenant-filtered latency monitoring
”Are cheaper tiers delivering acceptable quality?”	Product	Per-tier quality evaluation
”Which customer is causing the cost spike?”	Engineering	Real-time tenant usage breakdown

Netra’s native set_tenant_id() API makes this straightforward—no custom tagging workarounds needed.

The MeetingMind Scenario

MeetingMind is a fictional B2B SaaS platform that provides AI-powered meeting summarization for enterprise teams. The platform serves customers with different needs and budgets:

Customer	Industry	Needs	Tier
Apex Legal	Law Firm	Detailed transcripts with citations	Enterprise
Stratex Consulting	Consulting	Action items and key decisions	Professional
TechStart Inc	Tech Startup	Quick summaries on a budget	Starter

Each tier uses a different model and has different SLA commitments:

Tier	Model	Latency SLA	Features	Price
Enterprise	GPT-4	P95 < 2s	Full summary + action items + decisions	$0.10/meeting
Professional	GPT-4-turbo	P95 < 3s	Summary + action items	$0.05/meeting
Starter	GPT-3.5-turbo	Best effort	Summary only	$0.01/meeting

Building the Meeting Summarizer

First, we build the core agent before adding observability. This separation makes it easier to understand the business logic independently from instrumentation.

Installation

pip install netra-sdk openai

Environment Setup

Configure your API keys for both OpenAI (for the LLM) and Netra (for observability).

export NETRA_API_KEY="your-netra-api-key"
export OPENAI_API_KEY="your-openai-api-key"

Tenant Configuration

We define the tier configurations that determine which model and features each customer gets. This configuration drives both the business logic and the observability setup.

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class TenantConfig:
    """Configuration for a tenant's service tier."""
    tenant_id: str
    tier: str
    model: str
    features: List[str]
    latency_sla_ms: Optional[int]

# Tenant configurations
TENANT_CONFIGS = {
    "apex-legal": TenantConfig(
        tenant_id="apex-legal",
        tier="enterprise",
        model="gpt-4",
        features=["summary", "action_items", "decisions"],
        latency_sla_ms=2000
    ),
    "stratex-consulting": TenantConfig(
        tenant_id="stratex-consulting",
        tier="professional",
        model="gpt-4-turbo",
        features=["summary", "action_items"],
        latency_sla_ms=3000
    ),
    "techstart-inc": TenantConfig(
        tenant_id="techstart-inc",
        tier="starter",
        model="gpt-3.5-turbo",
        features=["summary"],
        latency_sla_ms=None  # Best effort
    ),
}

The MeetingSummarizer Class

The core summarizer takes a meeting transcript and returns structured output based on the tenant’s tier. Higher tiers get more detailed analysis.

from openai import OpenAI
from typing import Dict, Any
import json

class MeetingSummarizer:
    """Multi-tenant meeting summarization agent."""

    def __init__(self, tenant_id: str):
        if tenant_id not in TENANT_CONFIGS:
            raise ValueError(f"Unknown tenant: {tenant_id}")

        self.config = TENANT_CONFIGS[tenant_id]
        self.client = OpenAI()

    def _build_prompt(self, transcript: str) -> str:
        """Build the prompt based on tenant tier features."""
        feature_instructions = []

        if "summary" in self.config.features:
            feature_instructions.append("- **Summary**: A concise 2-3 sentence summary of the meeting")

        if "action_items" in self.config.features:
            feature_instructions.append("- **Action Items**: A list of tasks assigned, with owner and deadline if mentioned")

        if "decisions" in self.config.features:
            feature_instructions.append("- **Decisions**: Key decisions made during the meeting")

        features_text = "\n".join(feature_instructions)

        return f"""Analyze the following meeting transcript and extract the requested information.

**Required Output:**
{features_text}

**Meeting Transcript:**
{transcript}

Respond in JSON format with keys matching the requested sections (summary, action_items, decisions).
"""

    def summarize(self, transcript: str, user_id: str = None) -> Dict[str, Any]:
        """Summarize a meeting transcript."""
        prompt = self._build_prompt(transcript)

        response = self.client.chat.completions.create(
            model=self.config.model,
            messages=[
                {
                    "role": "system",
                    "content": "You are an expert meeting analyst. Extract key information accurately and concisely. Always respond with valid JSON."
                },
                {"role": "user", "content": prompt}
            ],
            temperature=0.1,
            response_format={"type": "json_object"}
        )

        result = json.loads(response.choices[0].message.content)

        return {
            "tenant_id": self.config.tenant_id,
            "tier": self.config.tier,
            "model": self.config.model,
            "user_id": user_id,
            "result": result,
            "usage": {
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens
            }
        }

Testing the Summarizer

Let’s test the summarizer with a sample meeting transcript to verify it works before adding observability.

# Sample meeting transcript
SAMPLE_TRANSCRIPT = """
Meeting: Q4 Planning Session
Date: January 15, 2026
Attendees: Alice (PM), Bob (Engineering), Carol (Design)

Alice: Let's review our Q4 priorities. We need to ship the new dashboard by March.
Bob: The backend is 80% complete. We need two more weeks for the API endpoints.
Carol: I can have the designs finalized by Friday.
Alice: Great. Bob, can you also look into the performance issues reported last week?
Bob: Yes, I'll prioritize that. Should be fixed by Wednesday.
Alice: Perfect. Let's sync again next Monday.
"""

# Test with different tenants
for tenant_id in ["apex-legal", "stratex-consulting", "techstart-inc"]:
    summarizer = MeetingSummarizer(tenant_id)
    result = summarizer.summarize(SAMPLE_TRANSCRIPT, user_id="demo-user")

    print(f"\n=== {tenant_id} ({result['tier']}) ===")
    print(f"Model: {result['model']}")
    print(f"Tokens: {result['usage']['total_tokens']}")
    print(f"Result: {json.dumps(result['result'], indent=2)}")

You’ll notice that:

Enterprise (Apex Legal) gets summary, action items, AND decisions
Professional (Stratex Consulting) gets summary and action items
Starter (TechStart Inc) gets summary only

Adding Observability

Now we instrument the agent to capture per-tenant metrics. The key differentiator with Netra is the native set_tenant_id() API—all subsequent traces are automatically attributed to that tenant.

Initializing Netra

Initialize Netra at application startup with your app name and environment. We enable auto-instrumentation for OpenAI to capture LLM calls automatically.

import os
from netra import Netra
from netra.instrumentation.instruments import InstrumentSet

Netra.init(
    app_name="meetingmind",
    headers=f"x-api-key={os.getenv('NETRA_API_KEY')}",
    environment="production",
    trace_content=True,
    instruments=set([InstrumentSet.OPENAI]),
)

Setting Tenant Context

The most important step: call set_tenant_id() at the start of each request. This associates all traces with the customer, enabling per-tenant filtering across the entire Netra platform.

from netra import Netra

def handle_request(tenant_id: str, user_id: str, transcript: str):
    """Handle an incoming summarization request with tenant context."""
    # Set tenant context - all traces will be attributed to this tenant
    Netra.set_tenant_id(tenant_id)

    # Set user context for per-user analytics within the tenant
    Netra.set_user_id(user_id)

    # Set session ID if this is part of a conversation
    session_id = f"{tenant_id}-{user_id}-session"
    Netra.set_session_id(session_id)

    # Add custom attributes for additional filtering
    config = TENANT_CONFIGS[tenant_id]
    Netra.set_custom_attributes(key="tier", value=config.tier)
    Netra.set_custom_attributes(key="model", value=config.model)

    # Now run the summarization - traces are automatically attributed
    summarizer = MeetingSummarizer(tenant_id)
    return summarizer.summarize(transcript, user_id)

Set the tenant ID early in your request lifecycle—typically in middleware or at the start of request handling. This ensures all traces within that request are properly attributed.

The TracedMeetingSummarizer Class

For comprehensive observability, we wrap the summarizer with explicit spans that capture the full pipeline: prompt construction, LLM generation, and response parsing.

from netra import Netra, SpanType, UsageModel
import time

class TracedMeetingSummarizer:
    """Multi-tenant meeting summarizer with full Netra instrumentation."""

    def __init__(self, tenant_id: str):
        if tenant_id not in TENANT_CONFIGS:
            raise ValueError(f"Unknown tenant: {tenant_id}")

        self.config = TENANT_CONFIGS[tenant_id]
        self.client = OpenAI()

    def summarize(self, transcript: str, user_id: str = None) -> Dict[str, Any]:
        """Summarize a meeting transcript with full tracing."""
        # Set tenant context
        Netra.set_tenant_id(self.config.tenant_id)
        if user_id:
            Netra.set_user_id(user_id)

        Netra.set_custom_attributes(key="tier", value=self.config.tier)

        with Netra.start_span("meeting-summarization") as parent_span:
            parent_span.set_attribute("tenant_id", self.config.tenant_id)
            parent_span.set_attribute("tier", self.config.tier)
            parent_span.set_attribute("model", self.config.model)

            start_time = time.time()

            # Build the prompt
            with Netra.start_span("prompt-construction") as prompt_span:
                prompt = self._build_prompt(transcript)
                prompt_span.set_attribute("transcript_length", len(transcript))
                prompt_span.set_attribute("features", ",".join(self.config.features))
                prompt_span.set_success()

            # Generate the response
            with Netra.start_span("llm-generation", as_type=SpanType.GENERATION) as gen_span:
                gen_span.set_model(self.config.model)
                gen_span.set_llm_system("openai")
                gen_span.set_prompt(prompt)

                response = self.client.chat.completions.create(
                    model=self.config.model,
                    messages=[
                        {
                            "role": "system",
                            "content": "You are an expert meeting analyst. Extract key information accurately and concisely. Always respond with valid JSON."
                        },
                        {"role": "user", "content": prompt}
                    ],
                    temperature=0.1,
                    response_format={"type": "json_object"}
                )

                # Track token usage and cost
                prompt_tokens = response.usage.prompt_tokens
                completion_tokens = response.usage.completion_tokens

                # Calculate cost based on model
                cost = self._calculate_cost(prompt_tokens, completion_tokens)

                gen_span.set_usage([
                    UsageModel(
                        model=self.config.model,
                        cost_in_usd=cost,
                        usage_type="chat",
                        units_used=prompt_tokens + completion_tokens
                    )
                ])

                gen_span.set_attribute("tokens.prompt", prompt_tokens)
                gen_span.set_attribute("tokens.completion", completion_tokens)
                gen_span.set_attribute("cost.usd", cost)
                gen_span.set_success()

            # Parse the response
            with Netra.start_span("response-parsing") as parse_span:
                result = json.loads(response.choices[0].message.content)
                parse_span.set_attribute("keys_extracted", list(result.keys()))
                parse_span.set_success()

            # Record total latency
            latency_ms = (time.time() - start_time) * 1000
            parent_span.set_attribute("latency_ms", latency_ms)

            # Check SLA compliance
            if self.config.latency_sla_ms:
                sla_met = latency_ms <= self.config.latency_sla_ms
                parent_span.set_attribute("sla_met", sla_met)
                if not sla_met:
                    parent_span.add_event("sla-breach", {
                        "actual_ms": latency_ms,
                        "sla_ms": self.config.latency_sla_ms
                    })

            parent_span.set_success()

            return {
                "tenant_id": self.config.tenant_id,
                "tier": self.config.tier,
                "model": self.config.model,
                "user_id": user_id,
                "result": result,
                "latency_ms": latency_ms,
                "cost_usd": cost,
                "usage": {
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "total_tokens": prompt_tokens + completion_tokens
                }
            }

    def _build_prompt(self, transcript: str) -> str:
        """Build the prompt based on tenant tier features."""
        feature_instructions = []

        if "summary" in self.config.features:
            feature_instructions.append("- **Summary**: A concise 2-3 sentence summary of the meeting")

        if "action_items" in self.config.features:
            feature_instructions.append("- **Action Items**: A list of tasks assigned, with owner and deadline if mentioned")

        if "decisions" in self.config.features:
            feature_instructions.append("- **Decisions**: Key decisions made during the meeting")

        features_text = "\n".join(feature_instructions)

        return f"""Analyze the following meeting transcript and extract the requested information.

**Required Output:**
{features_text}

**Meeting Transcript:**
{transcript}

Respond in JSON format with keys matching the requested sections (summary, action_items, decisions).
"""

    def _calculate_cost(self, prompt_tokens: int, completion_tokens: int) -> float:
        """Calculate cost based on model pricing."""
        # Pricing per 1M tokens (as of 2024)
        pricing = {
            "gpt-4": {"input": 30.0, "output": 60.0},
            "gpt-4-turbo": {"input": 10.0, "output": 30.0},
            "gpt-3.5-turbo": {"input": 0.5, "output": 1.5},
        }

        model_pricing = pricing.get(self.config.model, pricing["gpt-3.5-turbo"])
        input_cost = (prompt_tokens / 1_000_000) * model_pricing["input"]
        output_cost = (completion_tokens / 1_000_000) * model_pricing["output"]

        return input_cost + output_cost

Simulating Multi-Tenant Traffic

To see the tenant dashboard in action, let’s simulate requests from all three customers. This generates real data that you can view in Netra.

import random

def simulate_multi_tenant_traffic():
    """Simulate requests from multiple tenants."""
    # Request distribution by tenant
    requests = [
        ("apex-legal", 5),        # Enterprise: fewer, higher value
        ("stratex-consulting", 10), # Professional: moderate volume
        ("techstart-inc", 20),     # Starter: high volume, lower cost
    ]

    users_per_tenant = {
        "apex-legal": ["[email protected]", "[email protected]"],
        "stratex-consulting": ["[email protected]", "[email protected]", "[email protected]"],
        "techstart-inc": ["[email protected]", "[email protected]"],
    }

    results = []

    for tenant_id, num_requests in requests:
        print(f"\nProcessing {num_requests} requests for {tenant_id}...")
        summarizer = TracedMeetingSummarizer(tenant_id)

        for i in range(num_requests):
            user_id = random.choice(users_per_tenant[tenant_id])
            result = summarizer.summarize(SAMPLE_TRANSCRIPT, user_id)
            results.append(result)
            print(f"  Request {i+1}/{num_requests}: {result['latency_ms']:.0f}ms, ${result['cost_usd']:.6f}")

    # Summary
    print("\n=== Traffic Summary ===")
    by_tenant = {}
    for r in results:
        tid = r["tenant_id"]
        if tid not in by_tenant:
            by_tenant[tid] = {"count": 0, "cost": 0, "latency": []}
        by_tenant[tid]["count"] += 1
        by_tenant[tid]["cost"] += r["cost_usd"]
        by_tenant[tid]["latency"].append(r["latency_ms"])

    for tenant_id, stats in by_tenant.items():
        avg_latency = sum(stats["latency"]) / len(stats["latency"])
        print(f"{tenant_id}: {stats['count']} requests, ${stats['cost']:.4f} total, {avg_latency:.0f}ms avg latency")

    # Flush traces to Netra
    Netra.shutdown()

simulate_multi_tenant_traffic()

After running this simulation, navigate to Observability → Tenants in Netra. You’ll see all three customers with their trace counts, session counts, and total costs—instantly filterable and sortable.

Netra Tenants dashboard showing Apex Legal, Stratex Consulting, and TechStart Inc with their respective costs and trace counts

Cost Attribution & Usage Tracking

Netra provides two ways to access per-tenant cost and usage data: programmatically via the SDK, or visually through the dashboard.

Querying Usage via API

Use the get_tenant_usage() API to retrieve aggregated metrics for any tenant within a time range. This is useful for building custom dashboards, integrating with internal tools, or exporting data.

from netra import Netra

def get_tenant_usage_data(tenant_id: str, start_time: str, end_time: str):
    """Retrieve usage data for a tenant."""
    usage = Netra.usage.get_tenant_usage(
        tenant_id=tenant_id,
        start_time=start_time,
        end_time=end_time,
    )

    if usage:
        return {
            "tenant_id": usage.tenant_id,
            "token_count": usage.token_count,
            "request_count": usage.request_count,
            "session_count": usage.session_count,
            "total_cost": usage.total_cost,
        }
    return None

# Example: Get January usage for a tenant
usage = get_tenant_usage_data(
    tenant_id="apex-legal",
    start_time="2026-01-01T00:00:00.000Z",
    end_time="2026-01-31T23:59:59.000Z",
)
print(f"Apex Legal January usage: {usage['request_count']} requests, ${usage['total_cost']:.4f}")

Viewing Usage in the Dashboard

For quick access without writing code, navigate to Observability → Tenants in the Netra dashboard. This view provides:

Tenant list with aggregated metrics (traces, sessions, cost)
Time range filtering to analyze specific periods
Sort by cost to identify high-usage customers
Click-through to traces for detailed investigation

Netra Tenants dashboard showing per-tenant usage breakdown with cost, sessions, and trace counts

Comparing Usage Across Tenants

You can also query multiple tenants programmatically to compare usage patterns:

def compare_tenant_usage(start_time: str, end_time: str):
    """Compare usage across all tenants."""
    comparison = []

    for tenant_id in TENANT_CONFIGS.keys():
        usage = Netra.usage.get_tenant_usage(
            tenant_id=tenant_id,
            start_time=start_time,
            end_time=end_time,
        )

        if usage:
            comparison.append({
                "tenant_id": tenant_id,
                "tier": TENANT_CONFIGS[tenant_id].tier,
                "requests": usage.request_count,
                "tokens": usage.token_count,
                "sessions": usage.session_count,
                "cost": usage.total_cost,
            })

    return comparison

# Compare January usage across tenants
usage_data = compare_tenant_usage(
    "2026-01-01T00:00:00.000Z",
    "2026-01-31T23:59:59.000Z"
)

for tenant in usage_data:
    print(f"{tenant['tenant_id']}: {tenant['requests']} requests, ${tenant['cost']:.4f}")

Example usage comparison:

Tenant	Tier	Requests	Tokens	Sessions	LLM Cost
apex-legal	enterprise	5	2,340	3	$0.0234
stratex-consulting	professional	10	3,120	5	$0.0156
techstart-inc	starter	20	4,800	8	$0.0024

Use the API for automated reporting, integration with internal dashboards, or exporting to external systems. Use the UI for quick investigations and ad-hoc analysis.

Session & User Analytics

Understand usage patterns within each tenant to identify power users and optimize resource allocation.

Session Stats per Tenant

Query session-level statistics filtered by tenant to understand conversation patterns.

from netra.dashboard import (
    SessionFilterConfig,
    SessionFilter,
    SessionFilterField,
    SessionFilterOperator,
    SessionFilterType,
    SortField,
    SortOrder,
)

def get_tenant_sessions(tenant_id: str, start_time: str, end_time: str):
    """Get session statistics for a specific tenant."""
    session_stats = Netra.dashboard.get_session_stats(
        start_time=start_time,
        end_time=end_time,
        limit=50,
        filters=[
            SessionFilter(
                field=SessionFilterField.TENANT_ID,
                operator=SessionFilterOperator.EQUALS,
                type=SessionFilterType.STRING,
                value=tenant_id,
            )
        ],
        sort_field=SortField.TOTAL_COST,
        sort_order=SortOrder.DESC,
    )

    print(f"\n=== Sessions for {tenant_id} ===")
    for session in session_stats.sessions:
        print(f"  Session: {session.session_id[:20]}...")
        print(f"    User: {session.user_id}")
        print(f"    Traces: {session.trace_count}")
        print(f"    Cost: ${session.total_cost:.4f}")
        print(f"    Duration: {session.duration_ms}ms")

    return session_stats

# Get sessions for Enterprise tenant
get_tenant_sessions(
    "apex-legal",
    "2026-01-01T00:00:00.000Z",
    "2026-01-31T23:59:59.000Z"
)

Latency Monitoring & SLA Alerts

Enterprise customers pay premium prices for guaranteed performance. You need to monitor and alert when SLAs are breached.

Setting Up Tenant-Specific Alerts

In the Netra dashboard, navigate to Alert Rules and create a new alert with tenant filtering:

Create Alert Rule

Click Create Alert Rule and name it “Enterprise Latency SLA Breach”

Select Scope and Metric

Scope: Trace (monitor end-to-end requests)
Metric: Latency

Apply Tenant Filter

Add a filter for tenant_id = apex-legal to only monitor Enterprise tier requests

Set Threshold

Condition: Greater than 2000ms
Time Window: 5 minutes (to avoid alerting on single slow requests)

Configure Contact Point

Select your Slack channel or email for notifications

Alert rule configuration showing tenant filter for apex-legal with latency threshold of 2000ms

You can create similar alerts for each tier with their respective SLA thresholds:

apex-legal (Enterprise): Alert if latency > 2000ms
stratex-consulting (Professional): Alert if latency > 3000ms
techstart-inc (Starter): No SLA alerts (best effort)

Evaluation - Quality per Tenant

Different tiers use different models. Are cheaper tiers actually delivering acceptable quality? Systematic evaluation answers this question.

Why Evaluate per Tenant?

Stakeholder	Question	How Evaluation Helps
Product	Is GPT-3.5 good enough for Starter tier?	Compare quality scores across tiers
Finance	Are we over-serving low-tier customers?	Cost vs. quality analysis
Customer Success	Which tenants are getting poor quality?	Per-tenant quality dashboards
Engineering	Did the prompt change improve quality?	Before/after A/B comparison

Creating Evaluators

In Netra, navigate to Evaluation → Evaluators. You can choose from the Library of pre-built evaluators or create custom ones.

Using LLM as Judge Templates

From the Library tab, select templates that fit your use case:

Template	Use For	Pass Criteria
Answer Correctness	Compare generated summary against expected output	score >= 0.7
Conciseness	Ensure summaries are brief and to the point	score >= 0.7
Answer Relevance	Check if the summary addresses the meeting content	score >= 0.7

Click Add on any template to customize it for your needs. You can adjust the prompt, select your LLM provider (OpenAI, Anthropic, Google, Mistral), and set pass criteria.

LLM as Judge evaluator configuration with prompt template and pass criteria

Using Rule-Based Evaluators

For deterministic checks, use rule-based evaluators from the Library:

Evaluator	Use For	Configuration
Latency	SLA compliance per tier	Pass if latency < threshold (e.g., 2000ms for Enterprise)
Cost	Budget monitoring per tenant	Pass if cost < threshold
JSON Evaluator	Validate output structure	Pass if output is valid JSON with required fields

Creating a Code Evaluator

For custom business logic like tier-specific validation, create a Code Evaluator:

Code Evaluator configuration with JavaScript handler function

Example: Tier Completeness Check

// handler function is required
function handler(input, output, expectedOutput) {
    // Parse the output JSON
    let result;
    try {
        result = JSON.parse(output);
    } catch {
        return 0; // Fail if not valid JSON
    }

    // Check required fields based on tier (passed via expectedOutput or metadata)
    const tier = expectedOutput?.tier || "starter";

    const required = {
        "enterprise": ["summary", "action_items", "decisions"],
        "professional": ["summary", "action_items"],
        "starter": ["summary"],
    };

    const requiredKeys = required[tier] || ["summary"];
    const hasAllKeys = requiredKeys.every(key => key in result);

    return hasAllKeys ? 1 : 0;
}

Set Output Type to Numerical and Pass Criteria to >= 0.7.

Running Evaluations per Tier

Create a test dataset with sample transcripts and run evaluations with different tenant contexts:

def run_tier_evaluation():
    """Run quality evaluation across all tiers."""
    test_transcripts = [
        SAMPLE_TRANSCRIPT,
        # Add more test transcripts here
    ]

    results = {}

    for tenant_id, config in TENANT_CONFIGS.items():
        print(f"\nEvaluating {tenant_id} ({config.tier})...")
        summarizer = TracedMeetingSummarizer(tenant_id)

        tier_results = []
        for transcript in test_transcripts:
            result = summarizer.summarize(transcript)
            tier_results.append(result)

        results[tenant_id] = {
            "tier": config.tier,
            "model": config.model,
            "avg_latency": sum(r["latency_ms"] for r in tier_results) / len(tier_results),
            "avg_cost": sum(r["cost_usd"] for r in tier_results) / len(tier_results),
            "sample_output": tier_results[0]["result"],
        }

    # Print comparison
    print("\n=== Tier Quality Comparison ===")
    for tenant_id, data in results.items():
        print(f"\n{tenant_id} ({data['tier']}):")
        print(f"  Model: {data['model']}")
        print(f"  Avg Latency: {data['avg_latency']:.0f}ms")
        print(f"  Avg Cost: ${data['avg_cost']:.6f}")
        print(f"  Output Keys: {list(data['sample_output'].keys())}")

run_tier_evaluation()

View results in Evaluation → Test Runs to see quality scores side-by-side across tiers.

Evaluation dashboard showing quality scores: Enterprise 94%, Professional 89%, Starter 76%

A/B Testing Models per Tenant Segment

Should you upgrade the Starter tier from GPT-3.5 to GPT-4-turbo? Let’s run an A/B test to find out.

Running the A/B Test

Run the same test cases with GPT-3.5-turbo (current Starter model)
Run the same test cases with GPT-4-turbo (candidate upgrade)
Compare results using Netra’s trace comparison

def run_ab_test(transcript: str):
    """Run A/B test comparing GPT-3.5 vs GPT-4-turbo for Starter tier."""
    # Test with GPT-3.5-turbo (current)
    summarizer_a = TracedMeetingSummarizer("techstart-inc")
    result_a = summarizer_a.summarize(transcript, user_id="ab-test-user")

    # Temporarily override to test GPT-4-turbo
    original_model = TENANT_CONFIGS["techstart-inc"].model
    TENANT_CONFIGS["techstart-inc"].model = "gpt-4-turbo"

    summarizer_b = TracedMeetingSummarizer("techstart-inc")
    result_b = summarizer_b.summarize(transcript, user_id="ab-test-user")

    # Restore original
    TENANT_CONFIGS["techstart-inc"].model = original_model

    print("\n=== A/B Test Results ===")
    print(f"{'Metric':<20} {'GPT-3.5-turbo':<15} {'GPT-4-turbo':<15} {'Delta':<15}")
    print("-" * 65)
    print(f"{'Latency (ms)':<20} {result_a['latency_ms']:<15.0f} {result_b['latency_ms']:<15.0f} {result_b['latency_ms'] - result_a['latency_ms']:+.0f}")
    print(f"{'Cost (USD)':<20} ${result_a['cost_usd']:<14.6f} ${result_b['cost_usd']:<14.6f} {((result_b['cost_usd'] / result_a['cost_usd']) - 1) * 100:+.0f}%")
    print(f"{'Tokens':<20} {result_a['usage']['total_tokens']:<15} {result_b['usage']['total_tokens']:<15}")

    return result_a, result_b

run_ab_test(SAMPLE_TRANSCRIPT)

Using Trace Comparison

In Netra, navigate to Observability → Traces, select traces from both model runs, and click Compare. You’ll see:

Metric	GPT-3.5-turbo	GPT-4-turbo	Delta
Quality Score	76%	89%	+13%
Avg Latency	800ms	1200ms	+50%
Cost per Request	$0.002	$0.008	+300%

Decision Framework: Is a 13% quality improvement worth a 4x cost increase? For Starter tier customers paying $0.01/meeting, likely not. But for Professional tier upgrades, the math might work out differently.

Summary

You’ve learned how to build comprehensive multi-tenant observability for a B2B AI platform:

Capability	What You Built	Key Netra Feature
Tenant Tracking	Per-customer trace attribution	`set_tenant_id()`
Cost Attribution	Accurate billing per customer	`get_tenant_usage()`
SLA Monitoring	Tier-specific latency alerts	Alert Rules with tenant filter
Quality Evaluation	Per-tier quality comparison	Evaluators + Datasets
A/B Testing	Model comparison per segment	Trace Comparison

Key Takeaways

Native tenant tracking eliminates custom tagging workarounds—just call set_tenant_id() and all traces are attributed automatically
Cost attribution enables accurate billing and identifies which customers drive costs
Per-tenant alerting ensures SLA compliance for premium tiers
Quality evaluation per tier validates that cheaper tiers still deliver acceptable quality
A/B testing helps make data-driven decisions about tier configurations

Learn More

Tenants Documentation

Deep dive into tenant tracking features

Usage APIs

Query usage data programmatically

Alert Rules

Set up proactive monitoring

Evaluation Guide

Build systematic quality checks

Cookbooks

Open in Google Colab

​What You’ll Learn

1. Build Multi-Tenant Agent

2. Add Tenant-Level Tracing

3. Track Costs & Usage

4. Monitor SLAs

5. Evaluate Quality per Tier

​Prerequisites

​High-Level Concepts

​Why Multi-Tenant Observability Matters

​The MeetingMind Scenario

​Building the Meeting Summarizer

​Installation

​Environment Setup

​Tenant Configuration

​The MeetingSummarizer Class

​Testing the Summarizer

​Adding Observability

​Initializing Netra

​Setting Tenant Context

​The TracedMeetingSummarizer Class

​Simulating Multi-Tenant Traffic

​Cost Attribution & Usage Tracking

​Querying Usage via API

​Viewing Usage in the Dashboard

​Comparing Usage Across Tenants

​Session & User Analytics

​Session Stats per Tenant

​Latency Monitoring & SLA Alerts

​Setting Up Tenant-Specific Alerts

​Evaluation - Quality per Tenant

​Why Evaluate per Tenant?

​Creating Evaluators

​Using LLM as Judge Templates

​Using Rule-Based Evaluators

​Creating a Code Evaluator

​Running Evaluations per Tier

​A/B Testing Models per Tenant Segment

​Running the A/B Test

​Using Trace Comparison

​Summary

​Key Takeaways

​Learn More

Tenants Documentation

Usage APIs

Alert Rules

Evaluation Guide

What You’ll Learn

Prerequisites

High-Level Concepts

Why Multi-Tenant Observability Matters

The MeetingMind Scenario

Building the Meeting Summarizer

Installation

Environment Setup

Tenant Configuration

The MeetingSummarizer Class

Testing the Summarizer

Adding Observability

Initializing Netra

Setting Tenant Context

The TracedMeetingSummarizer Class

Simulating Multi-Tenant Traffic

Cost Attribution & Usage Tracking

Querying Usage via API

Viewing Usage in the Dashboard

Comparing Usage Across Tenants

Session & User Analytics

Session Stats per Tenant

Latency Monitoring & SLA Alerts

Setting Up Tenant-Specific Alerts

Evaluation - Quality per Tenant

Why Evaluate per Tenant?

Creating Evaluators

Using LLM as Judge Templates

Using Rule-Based Evaluators

Creating a Code Evaluator

Running Evaluations per Tier

A/B Testing Models per Tenant Segment

Running the A/B Test

Using Trace Comparison

Summary

Key Takeaways

Learn More