Open in Google Colab
Run the complete notebook in your browser
All company names (MeetingMind, Apex Legal, Stratex Consulting, TechStart Inc) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.
What You’ll Learn
This cookbook guides you through 5 key stages of building observable multi-tenant AI applications:1. Build Multi-Tenant Agent
Create a meeting summarization agent that routes requests to different models based on customer tier.
2. Add Tenant-Level Tracing
Instrument your agent with Netra’s native tenant tracking to capture per-customer metrics.
3. Track Costs & Usage
Query per-tenant usage via API or view cost breakdowns directly in the Netra dashboard.
4. Monitor SLAs
Set up alerts that trigger when specific customers breach latency or error rate thresholds.
5. Evaluate Quality per Tier
Run systematic evaluations to ensure each pricing tier delivers acceptable quality.
Prerequisites
- Python 3.9+ or Node.js 18+
- OpenAI API key
- Netra API key (Get your key here)
High-Level Concepts
Why Multi-Tenant Observability Matters
For B2B AI platforms serving multiple customers, generic observability isn’t enough. You need to answer questions like:| Question | Who Asks | What You Need |
|---|---|---|
| ”How much did Customer X use last month?” | Finance | Per-tenant cost attribution |
| ”Is our Enterprise tier meeting SLAs?” | Customer Success | Tenant-filtered latency monitoring |
| ”Are cheaper tiers delivering acceptable quality?” | Product | Per-tier quality evaluation |
| ”Which customer is causing the cost spike?” | Engineering | Real-time tenant usage breakdown |
set_tenant_id() API makes this straightforward—no custom tagging workarounds needed.
The MeetingMind Scenario
MeetingMind is a fictional B2B SaaS platform that provides AI-powered meeting summarization for enterprise teams. The platform serves customers with different needs and budgets:| Customer | Industry | Needs | Tier |
|---|---|---|---|
| Apex Legal | Law Firm | Detailed transcripts with citations | Enterprise |
| Stratex Consulting | Consulting | Action items and key decisions | Professional |
| TechStart Inc | Tech Startup | Quick summaries on a budget | Starter |
| Tier | Model | Latency SLA | Features | Price |
|---|---|---|---|---|
| Enterprise | GPT-4 | P95 < 2s | Full summary + action items + decisions | $0.10/meeting |
| Professional | GPT-4-turbo | P95 < 3s | Summary + action items | $0.05/meeting |
| Starter | GPT-3.5-turbo | Best effort | Summary only | $0.01/meeting |
Building the Meeting Summarizer
First, we build the core agent before adding observability. This separation makes it easier to understand the business logic independently from instrumentation.Installation
Environment Setup
Configure your API keys for both OpenAI (for the LLM) and Netra (for observability).Tenant Configuration
We define the tier configurations that determine which model and features each customer gets. This configuration drives both the business logic and the observability setup.The MeetingSummarizer Class
The core summarizer takes a meeting transcript and returns structured output based on the tenant’s tier. Higher tiers get more detailed analysis.Testing the Summarizer
Let’s test the summarizer with a sample meeting transcript to verify it works before adding observability.- Enterprise (Apex Legal) gets summary, action items, AND decisions
- Professional (Stratex Consulting) gets summary and action items
- Starter (TechStart Inc) gets summary only
Adding Observability
Now we instrument the agent to capture per-tenant metrics. The key differentiator with Netra is the nativeset_tenant_id() API—all subsequent traces are automatically attributed to that tenant.
Initializing Netra
Initialize Netra at application startup with your app name and environment. We enable auto-instrumentation for OpenAI to capture LLM calls automatically.Setting Tenant Context
The most important step: callset_tenant_id() at the start of each request. This associates all traces with the customer, enabling per-tenant filtering across the entire Netra platform.
The TracedMeetingSummarizer Class
For comprehensive observability, we wrap the summarizer with explicit spans that capture the full pipeline: prompt construction, LLM generation, and response parsing.Simulating Multi-Tenant Traffic
To see the tenant dashboard in action, let’s simulate requests from all three customers. This generates real data that you can view in Netra.
Cost Attribution & Usage Tracking
Netra provides two ways to access per-tenant cost and usage data: programmatically via the SDK, or visually through the dashboard.Querying Usage via API
Use theget_tenant_usage() API to retrieve aggregated metrics for any tenant within a time range. This is useful for building custom dashboards, integrating with internal tools, or exporting data.
Viewing Usage in the Dashboard
For quick access without writing code, navigate to Observability → Tenants in the Netra dashboard. This view provides:- Tenant list with aggregated metrics (traces, sessions, cost)
- Time range filtering to analyze specific periods
- Sort by cost to identify high-usage customers
- Click-through to traces for detailed investigation

Comparing Usage Across Tenants
You can also query multiple tenants programmatically to compare usage patterns:| Tenant | Tier | Requests | Tokens | Sessions | LLM Cost |
|---|---|---|---|---|---|
| apex-legal | enterprise | 5 | 2,340 | 3 | $0.0234 |
| stratex-consulting | professional | 10 | 3,120 | 5 | $0.0156 |
| techstart-inc | starter | 20 | 4,800 | 8 | $0.0024 |
Session & User Analytics
Understand usage patterns within each tenant to identify power users and optimize resource allocation.Session Stats per Tenant
Query session-level statistics filtered by tenant to understand conversation patterns.Latency Monitoring & SLA Alerts
Enterprise customers pay premium prices for guaranteed performance. You need to monitor and alert when SLAs are breached.Setting Up Tenant-Specific Alerts
In the Netra dashboard, navigate to Alert Rules and create a new alert with tenant filtering:Apply Tenant Filter
Add a filter for
tenant_id = apex-legal to only monitor Enterprise tier requestsSet Threshold
- Condition: Greater than 2000ms
- Time Window: 5 minutes (to avoid alerting on single slow requests)

- apex-legal (Enterprise): Alert if latency > 2000ms
- stratex-consulting (Professional): Alert if latency > 3000ms
- techstart-inc (Starter): No SLA alerts (best effort)
Evaluation - Quality per Tenant
Different tiers use different models. Are cheaper tiers actually delivering acceptable quality? Systematic evaluation answers this question.Why Evaluate per Tenant?
| Stakeholder | Question | How Evaluation Helps |
|---|---|---|
| Product | Is GPT-3.5 good enough for Starter tier? | Compare quality scores across tiers |
| Finance | Are we over-serving low-tier customers? | Cost vs. quality analysis |
| Customer Success | Which tenants are getting poor quality? | Per-tenant quality dashboards |
| Engineering | Did the prompt change improve quality? | Before/after A/B comparison |
Creating Evaluators
In Netra, navigate to Evaluation → Evaluators. You can choose from the Library of pre-built evaluators or create custom ones.Using LLM as Judge Templates
From the Library tab, select templates that fit your use case:| Template | Use For | Pass Criteria |
|---|---|---|
| Answer Correctness | Compare generated summary against expected output | score >= 0.7 |
| Conciseness | Ensure summaries are brief and to the point | score >= 0.7 |
| Answer Relevance | Check if the summary addresses the meeting content | score >= 0.7 |

Using Rule-Based Evaluators
For deterministic checks, use rule-based evaluators from the Library:| Evaluator | Use For | Configuration |
|---|---|---|
| Latency | SLA compliance per tier | Pass if latency < threshold (e.g., 2000ms for Enterprise) |
| Cost | Budget monitoring per tenant | Pass if cost < threshold |
| JSON Evaluator | Validate output structure | Pass if output is valid JSON with required fields |
Creating a Code Evaluator
For custom business logic like tier-specific validation, create a Code Evaluator:
>= 0.7.
Running Evaluations per Tier
Create a test dataset with sample transcripts and run evaluations with different tenant contexts:
A/B Testing Models per Tenant Segment
Should you upgrade the Starter tier from GPT-3.5 to GPT-4-turbo? Let’s run an A/B test to find out.Running the A/B Test
- Run the same test cases with GPT-3.5-turbo (current Starter model)
- Run the same test cases with GPT-4-turbo (candidate upgrade)
- Compare results using Netra’s trace comparison
Using Trace Comparison
In Netra, navigate to Observability → Traces, select traces from both model runs, and click Compare. You’ll see:| Metric | GPT-3.5-turbo | GPT-4-turbo | Delta |
|---|---|---|---|
| Quality Score | 76% | 89% | +13% |
| Avg Latency | 800ms | 1200ms | +50% |
| Cost per Request | $0.002 | $0.008 | +300% |
Summary
You’ve learned how to build comprehensive multi-tenant observability for a B2B AI platform:| Capability | What You Built | Key Netra Feature |
|---|---|---|
| Tenant Tracking | Per-customer trace attribution | set_tenant_id() |
| Cost Attribution | Accurate billing per customer | get_tenant_usage() |
| SLA Monitoring | Tier-specific latency alerts | Alert Rules with tenant filter |
| Quality Evaluation | Per-tier quality comparison | Evaluators + Datasets |
| A/B Testing | Model comparison per segment | Trace Comparison |
Key Takeaways
- Native tenant tracking eliminates custom tagging workarounds—just call
set_tenant_id()and all traces are attributed automatically - Cost attribution enables accurate billing and identifies which customers drive costs
- Per-tenant alerting ensures SLA compliance for premium tiers
- Quality evaluation per tier validates that cheaper tiers still deliver acceptable quality
- A/B testing helps make data-driven decisions about tier configurations