Skip to main content
This cookbook shows you how to build comprehensive multi-tenant observability for B2B AI platforms—tracking costs per customer, monitoring SLA compliance, and attributing usage across your entire customer base.

Open in Google Colab

Run the complete notebook in your browser
All company names (MeetingMind, Apex Legal, Stratex Consulting, TechStart Inc) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.

What You’ll Learn

Set Tenant Context

Use Netra’s native tenant tracking to attribute all traces to specific customers

Track Per-Customer Costs

Query usage and cost data per tenant via API or dashboard

Monitor SLA Compliance

Set up tier-specific alerts that trigger on latency or error rate breaches

Analyze Usage Patterns

Understand session and user behavior within each tenant
Prerequisites:

The MeetingMind Scenario

MeetingMind is a fictional B2B SaaS platform that provides AI-powered meeting summarization. The platform serves customers with different needs and budgets:
CustomerIndustryTier
Apex LegalLaw FirmEnterprise
Stratex ConsultingConsultingProfessional
TechStart IncTech StartupStarter
Each tier uses a different configuration and has different SLA commitments:
TierModelLatency SLARate Limit
EnterpriseGPT-4o-miniP95 < 2s60 calls/min
ProfessionalGPT-4o-miniP95 < 3s30 calls/min
StarterGPT-4o-miniBest effort10 calls/min

Step 1: Install Packages

pip install netra-sdk openai

Step 2: Set Environment Variables

export NETRA_API_KEY="your-netra-api-key"
export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint"
export OPENAI_API_KEY="your-openai-api-key"

Step 3: Initialize Netra for Multi-Tenant Tracking

Initialize Netra at application startup with auto-instrumentation for OpenAI:
from netra import Netra
from netra.instrumentation.instruments import InstrumentSet

# Initialize Netra for multi-tenant observability
Netra.init(
    app_name="meetingmind",
    headers=f"x-api-key={os.getenv('NETRA_API_KEY')}",
    environment="production",
    trace_content=True,
    instruments={InstrumentSet.OPENAI},
)

Step 4: Define Tenant Configuration

Configure tier-specific settings for each customer:
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class TenantConfig:
    """Configuration for a tenant's service tier."""
    tenant_id: str
    tier: str
    model: str
    features: List[str]
    latency_sla_ms: Optional[int]
    max_calls_per_minute: int

# Tenant configurations
TENANT_CONFIGS = {
    "apex-legal": TenantConfig(
        tenant_id="apex-legal",
        tier="enterprise",
        model="gpt-4",
        features=["summary", "action_items", "decisions", "custom_reports"],
        latency_sla_ms=2000,
        max_calls_per_minute=60
    ),
    "stratex-consulting": TenantConfig(
        tenant_id="stratex-consulting",
        tier="professional",
        model="gpt-4-turbo",
        features=["summary", "action_items"],
        latency_sla_ms=3000,
        max_calls_per_minute=30
    ),
    "techstart-inc": TenantConfig(
        tenant_id="techstart-inc",
        tier="starter",
        model="gpt-3.5-turbo",
        features=["summary"],
        latency_sla_ms=None,  # Best effort
        max_calls_per_minute=10
    ),
}

Step 5: Create Multi-Tenant Meeting Summarizer

Build a service that tracks costs per tenant. This class handles tenant context setting, prompt building based on feature tiers, cost calculation, and SLA compliance checking — all within Netra spans.
from openai import OpenAI
import time
import uuid
import os
from netra import Netra, SpanType, UsageModel

class MultiTenantMeetingSummarizer:
    """Meeting summarization service with per-tenant cost tracking."""

    def __init__(self):
        self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.tenant_usage = {}  # Track usage per tenant

    def summarize_meeting(self, tenant_id: str, meeting_transcript: str, user_id: str = None) -> dict:
        """Summarize a meeting for a specific tenant with cost tracking."""

        # Validate tenant
        if tenant_id not in TENANT_CONFIGS:
            return {"error": f"Unknown tenant: {tenant_id}"}

        config = TENANT_CONFIGS[tenant_id]

        # Set tenant context - this is the key for multi-tenant observability
        Netra.set_tenant_id(tenant_id)
        Netra.set_session_id(str(uuid.uuid4()))
        if user_id:
            Netra.set_user_id(user_id)

        # Build the prompt
        prompt = f"Summarize this meeting transcript into:\n"
        if "summary" in config.features:
            prompt += "- Executive Summary (2-3 paragraphs)\n"
        if "action_items" in config.features:
            prompt += "- Action Items (numbered list)\n"
        if "decisions" in config.features:
            prompt += "- Key Decisions Made\n"
        if "custom_reports" in config.features:
            prompt += "- Recommendations for Follow-up\n"

        prompt += f"\nMeeting Transcript:\n{meeting_transcript}"

        # Start a span for the summarization operation
        with Netra.start_span("meeting-summarization") as span:
            span.set_attribute("tenant_id", tenant_id)
            span.set_attribute("tier", config.tier)
            span.set_attribute("model", config.model)

            start_time = time.time()

            # Call the API (auto-traced)
            response = self.openai_client.chat.completions.create(
                model=config.model,
                messages=[
                    {"role": "system", "content": "You are an expert meeting summarizer."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3
            )

            latency_ms = (time.time() - start_time) * 1000
            summary = response.choices[0].message.content

            # Calculate cost (simplified pricing model)
            # GPT-4o-mini pricing (approx): $0.15/1M input, $0.60/1M output
            input_price = 0.15 / 1_000_000
            output_price = 0.60 / 1_000_000

            prompt_tokens = response.usage.prompt_tokens
            completion_tokens = response.usage.completion_tokens
            total_tokens = response.usage.total_tokens

            cost = (prompt_tokens * input_price) + (completion_tokens * output_price)

            # Record detailed usage and cost in the span
            span.set_usage([
                UsageModel(
                    model=config.model,
                    cost_in_usd=cost,
                    usage_type="chat",
                    units_used=total_tokens
                )
            ])

            # Check SLA compliance
            sla_compliant = True
            if config.latency_sla_ms:
                sla_compliant = latency_ms <= config.latency_sla_ms
                span.set_attribute("sla_met", sla_compliant)
                if not sla_compliant:
                    span.add_event("sla-breach", {
                        "actual_ms": latency_ms,
                        "sla_ms": config.latency_sla_ms
                    })

            span.set_success()

            # Local tracking
            if tenant_id not in self.tenant_usage:
                self.tenant_usage[tenant_id] = {"count": 0, "tokens": 0, "total_cost": 0.0, "total_latency": 0}
            self.tenant_usage[tenant_id]["count"] += 1
            self.tenant_usage[tenant_id]["tokens"] += total_tokens
            self.tenant_usage[tenant_id]["total_cost"] += cost
            self.tenant_usage[tenant_id]["total_latency"] += latency_ms

            return {
                "tenant_id": tenant_id,
                "tier": config.tier,
                "summary": summary,
                "token_usage": {
                    "prompt": prompt_tokens,
                    "completion": completion_tokens,
                    "total": total_tokens
                },
                "latency_ms": latency_ms,
                "sla_compliant": sla_compliant,
                "cost": cost
            }

    def print_usage_summary(self):
        """Print usage summary by tenant."""
        for tenant_id, usage in self.tenant_usage.items():
            print(f"\n{tenant_id}:")
            print(f"  Calls: {usage['count']}")
            print(f"  Total Tokens: {usage['tokens']}")
            print(f"  Total Cost: ${usage['total_cost']:.4f}")
            print(f"  Avg Latency: {usage['total_latency']/usage['count']:.0f}ms")
The key pattern here is calling set_tenant_id() early in the request lifecycle. This ensures all subsequent traces — including auto-instrumented OpenAI calls — are automatically attributed to the correct tenant.

Step 6: Test with Sample Meetings

Simulate meeting summarization requests from different tenants:
# Initialize summarizer
summarizer = MultiTenantMeetingSummarizer()

# Enterprise tier (Apex Legal) - legal meeting
sample_meeting = """
Attendees: John (Partner), Sarah (Associate), Mike (Paralegal)
Duration: 45 minutes
Topic: Case Strategy for Smith v. Jones

John: Let's discuss our approach for the Smith case. The deposition is in 3 weeks.
Sarah: I've reviewed the discovery documents. The key issue is the contract's ambiguity around the liability clause.
Mike: I've created a timeline. The critical events are on pages 45-67 of the evidence log.
John: Good. Sarah, can you draft a summary of our position by Friday?
Sarah: I'll have it ready. Should I include recommendations for discovery?
John: Yes, especially around vendor communications. Mike, check if we have all related emails.
Mike: I'll pull those by tomorrow.
John: This looks solid. Let's reconvene next week after Sarah finishes the draft.
"""

result1 = summarizer.summarize_meeting(
    tenant_id="apex-legal",
    meeting_transcript=sample_meeting,
    user_id="[email protected]"
)

print(f"Tier: {result1['tier']}")
print(f"SLA Compliant: {result1['sla_compliant']}")
print(f"Latency: {result1['latency_ms']:.0f}ms")
print(f"Tokens Used: {result1['token_usage']['total']}")

# Professional tier (Stratex Consulting) - strategy meeting
meeting_transcript_2 = """
Team sync for Q2 strategy planning.
Attendees: CEO, CFO, Head of Product

CEO: Let's review our market position and Q2 targets.
CFO: Revenue is up 15% YoY. We're tracking to beat forecast.
Head of Product: New features launched last month show strong adoption.
CEO: Great! What are our risks?
CFO: Supply chain delays could impact timeline.
Head of Product: We need to hire 3 more engineers to meet roadmap.
CEO: Let's make that happen. Budget approved.
"""

result2 = summarizer.summarize_meeting(
    tenant_id="stratex-consulting",
    meeting_transcript=meeting_transcript_2,
    user_id="[email protected]"
)

# Starter tier (TechStart Inc) - standup
meeting_transcript_3 = """
Daily standup
Attendees: Dev team

Tom: I finished the API integration yesterday.
Lisa: I'm working on the UI components.
Chris: Testing is on track for Thursday release.
Tom: Good. Any blockers?
Lisa: Waiting for design approval on the dashboard.
Chris: Should be done today.
"""

result3 = summarizer.summarize_meeting(
    tenant_id="techstart-inc",
    meeting_transcript=meeting_transcript_3,
    user_id="[email protected]"
)

Step 7: Review Usage and Cost Breakdown

Analyze per-tenant usage patterns and costs:
# Print usage summary
summarizer.print_usage_summary()

# Calculate estimated costs (rough approximation)
# GPT-4o-mini pricing (approximate): $0.15/1M input tokens, $0.60/1M output tokens
input_price_per_token = 0.15 / 1_000_000
output_price_per_token = 0.60 / 1_000_000

for tenant_id, usage in summarizer.tenant_usage.items():
    # Rough split: assume 70% input, 30% output tokens
    input_tokens = int(usage['tokens'] * 0.7)
    output_tokens = int(usage['tokens'] * 0.3)
    cost = (input_tokens * input_price_per_token) + (output_tokens * output_price_per_token)

    print(f"\n{tenant_id}:")
    print(f"  Total Tokens: {usage['tokens']}")
    print(f"  Estimated Cost: ${cost:.4f}")
    print(f"  Cost per Call: ${cost/usage['count']:.4f}")

Step 8: SLA Monitoring

Check which tenants are meeting their SLA commitments:
sla_results = [
    ("apex-legal", result1['sla_compliant'], result1['latency_ms']),
    ("stratex-consulting", result2['sla_compliant'], result2['latency_ms']),
    ("techstart-inc", result3['sla_compliant'], result3['latency_ms']),
]

for tenant_id, compliant, latency in sla_results:
    config = TENANT_CONFIGS[tenant_id]
    status = "PASS" if compliant else "FAIL"
    sla_text = f"{config.latency_sla_ms}ms" if config.latency_sla_ms else "Best effort"
    print(f"\n{tenant_id} ({config.tier}):")
    print(f"  SLA Target: {sla_text}")
    print(f"  Actual Latency: {latency:.0f}ms")
    print(f"  Status: {status}")

Setting Up Tenant-Specific Alerts

In the Netra dashboard, navigate to Alert Rules and create tenant-filtered alerts:
1

Create Alert Rule

Click Create Alert Rule and name it “Enterprise Latency SLA Breach”
2

Select Scope and Metric

  • Scope: Trace (monitor end-to-end requests)
  • Metric: Latency
3

Apply Tenant Filter

Add a filter for tenant_id = apex-legal to only monitor Enterprise tier requests
4

Set Threshold

  • Condition: Greater than 2000ms
  • Time Window: 5 minutes (to avoid alerting on single slow requests)
5

Configure Contact Point

Select your Slack channel or email for notifications
Create similar alerts for each tier with their respective SLA thresholds:
TenantTierAlert ThresholdRate Limit
apex-legalEnterprise> 2000ms60 calls/min
stratex-consultingProfessional> 3000ms30 calls/min
techstart-incStarterNo SLA alert (best effort)10 calls/min

Step 9: Querying Tenant Metrics via Netra API

Once traces are sent to Netra, query tenant-specific metrics programmatically:
from datetime import datetime, timedelta, timezone

def get_tenant_usage_data(tenant_id: str, start_time: str, end_time: str):
    """Retrieve usage data for a tenant."""
    try:
        usage = Netra.usage.get_tenant_usage(
            tenant_id=tenant_id,
            start_time=start_time,
            end_time=end_time,
        )

        if usage:
            return {
                "tenant_id": usage.tenant_id,
                "token_count": usage.token_count,
                "request_count": usage.request_count,
                "session_count": usage.session_count,
                "total_cost": usage.total_cost,
            }
        return None
    except Exception as e:
        print(f"Error fetching usage for {tenant_id}: {e}")
        return None

# Example: Get usage for a tenant (Last 24 hours)
end_time = datetime.now(timezone.utc)
start_time = end_time - timedelta(days=1)

for tenant_id in TENANT_CONFIGS.keys():
    usage = get_tenant_usage_data(
        tenant_id=tenant_id,
        start_time=start_time.isoformat(),
        end_time=end_time.isoformat(),
    )
    if usage:
        print(f"{tenant_id}: {usage['request_count']} requests, ${usage['total_cost']:.4f}")
    else:
        print(f"{tenant_id}: No data returned (might be due to ingestion latency)")

What You’ll See in the Dashboard

After running this cookbook, check the Netra dashboard for:
  • Tenant selector filtering all traces to a specific customer
  • Per-tenant cost breakdown showing usage per customer
  • SLA compliance dashboard with latency metrics by tier
  • Comparative analytics showing which customers use which features
  • User activity filtered by tenant and user ID

Key Multi-Tenant Patterns

PatternUse CaseHow to Implement
Cost attributionBilling and profitabilitySet tenant_id at request start
SLA monitoringSupport and escalationFilter by tenant_id and latency threshold
Feature usageProduct insightsCheck feature flags in tenant config
User segmentationPer-user analyticsSet user_id in addition to tenant_id

See Also

Last modified on February 12, 2026