Multi-Tenant Cost Tracking

This cookbook shows you how to build comprehensive multi-tenant observability for B2B AI platforms—tracking costs per customer, monitoring SLA compliance, and attributing usage across your entire customer base.

Open in Google Colab

Run the complete notebook in your browser

All company names (MeetingMind, Apex Legal, Stratex Consulting, TechStart Inc) and scenarios in this cookbook are entirely fictional and used for demonstration purposes only.

What You’ll Learn

Set Tenant Context

Use Netra’s native tenant tracking to attribute all traces to specific customers

Track Per-Customer Costs

Query usage and cost data per tenant via API or dashboard

Monitor SLA Compliance

Set up tier-specific alerts that trigger on latency or error rate breaches

Analyze Usage Patterns

Understand session and user behavior within each tenant

Prerequisites:

Python >=3.10, < 3.14
OpenAI API key
Netra API key (Get your key here)

The MeetingMind Scenario

MeetingMind is a fictional B2B SaaS platform that provides AI-powered meeting summarization. The platform serves customers with different needs and budgets:

Customer	Industry	Tier
Apex Legal	Law Firm	Enterprise
Stratex Consulting	Consulting	Professional
TechStart Inc	Tech Startup	Starter

Each tier uses a different configuration and has different SLA commitments:

Tier	Model	Latency SLA	Rate Limit
Enterprise	GPT-4o-mini	P95 < 2s	60 calls/min
Professional	GPT-4o-mini	P95 < 3s	30 calls/min
Starter	GPT-4o-mini	Best effort	10 calls/min

Step 1: Install Packages

pip install netra-sdk openai

Step 2: Set Environment Variables

export NETRA_API_KEY="your-netra-api-key"
export NETRA_OTLP_ENDPOINT="your-netra-otlp-endpoint"
export OPENAI_API_KEY="your-openai-api-key"

Step 3: Initialize Netra for Multi-Tenant Tracking

Initialize Netra at application startup with auto-instrumentation for OpenAI:

from netra import Netra
from netra.instrumentation.instruments import InstrumentSet

# Initialize Netra for multi-tenant observability
Netra.init(
    app_name="meetingmind",
    headers=f"x-api-key={os.getenv('NETRA_API_KEY')}",
    environment="production",
    trace_content=True,
    instruments={InstrumentSet.OPENAI},
)

Step 4: Define Tenant Configuration

Configure tier-specific settings for each customer:

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class TenantConfig:
    """Configuration for a tenant's service tier."""
    tenant_id: str
    tier: str
    model: str
    features: List[str]
    latency_sla_ms: Optional[int]
    max_calls_per_minute: int

# Tenant configurations
TENANT_CONFIGS = {
    "apex-legal": TenantConfig(
        tenant_id="apex-legal",
        tier="enterprise",
        model="gpt-4",
        features=["summary", "action_items", "decisions", "custom_reports"],
        latency_sla_ms=2000,
        max_calls_per_minute=60
    ),
    "stratex-consulting": TenantConfig(
        tenant_id="stratex-consulting",
        tier="professional",
        model="gpt-4-turbo",
        features=["summary", "action_items"],
        latency_sla_ms=3000,
        max_calls_per_minute=30
    ),
    "techstart-inc": TenantConfig(
        tenant_id="techstart-inc",
        tier="starter",
        model="gpt-3.5-turbo",
        features=["summary"],
        latency_sla_ms=None,  # Best effort
        max_calls_per_minute=10
    ),
}

Step 5: Create Multi-Tenant Meeting Summarizer

Build a service that tracks costs per tenant. This class handles tenant context setting, prompt building based on feature tiers, cost calculation, and SLA compliance checking — all within Netra spans.

from openai import OpenAI
import time
import uuid
import os
from netra import Netra, SpanType, UsageModel

class MultiTenantMeetingSummarizer:
    """Meeting summarization service with per-tenant cost tracking."""

    def __init__(self):
        self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.tenant_usage = {}  # Track usage per tenant

    def summarize_meeting(self, tenant_id: str, meeting_transcript: str, user_id: str = None) -> dict:
        """Summarize a meeting for a specific tenant with cost tracking."""

        # Validate tenant
        if tenant_id not in TENANT_CONFIGS:
            return {"error": f"Unknown tenant: {tenant_id}"}

        config = TENANT_CONFIGS[tenant_id]

        # Set tenant context - this is the key for multi-tenant observability
        Netra.set_tenant_id(tenant_id)
        Netra.set_session_id(str(uuid.uuid4()))
        if user_id:
            Netra.set_user_id(user_id)

        # Build the prompt
        prompt = f"Summarize this meeting transcript into:\n"
        if "summary" in config.features:
            prompt += "- Executive Summary (2-3 paragraphs)\n"
        if "action_items" in config.features:
            prompt += "- Action Items (numbered list)\n"
        if "decisions" in config.features:
            prompt += "- Key Decisions Made\n"
        if "custom_reports" in config.features:
            prompt += "- Recommendations for Follow-up\n"

        prompt += f"\nMeeting Transcript:\n{meeting_transcript}"

        # Start a span for the summarization operation
        with Netra.start_span("meeting-summarization") as span:
            span.set_attribute("tenant_id", tenant_id)
            span.set_attribute("tier", config.tier)
            span.set_attribute("model", config.model)

            start_time = time.time()

            # Call the API (auto-traced)
            response = self.openai_client.chat.completions.create(
                model=config.model,
                messages=[
                    {"role": "system", "content": "You are an expert meeting summarizer."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3
            )

            latency_ms = (time.time() - start_time) * 1000
            summary = response.choices[0].message.content

            # Calculate cost (simplified pricing model)
            # GPT-4o-mini pricing (approx): $0.15/1M input, $0.60/1M output
            input_price = 0.15 / 1_000_000
            output_price = 0.60 / 1_000_000

            prompt_tokens = response.usage.prompt_tokens
            completion_tokens = response.usage.completion_tokens
            total_tokens = response.usage.total_tokens

            cost = (prompt_tokens * input_price) + (completion_tokens * output_price)

            # Record detailed usage and cost in the span
            span.set_usage([
                UsageModel(
                    model=config.model,
                    cost_in_usd=cost,
                    usage_type="chat",
                    units_used=total_tokens
                )
            ])

            # Check SLA compliance
            sla_compliant = True
            if config.latency_sla_ms:
                sla_compliant = latency_ms <= config.latency_sla_ms
                span.set_attribute("sla_met", sla_compliant)
                if not sla_compliant:
                    span.add_event("sla-breach", {
                        "actual_ms": latency_ms,
                        "sla_ms": config.latency_sla_ms
                    })

            span.set_success()

            # Local tracking
            if tenant_id not in self.tenant_usage:
                self.tenant_usage[tenant_id] = {"count": 0, "tokens": 0, "total_cost": 0.0, "total_latency": 0}
            self.tenant_usage[tenant_id]["count"] += 1
            self.tenant_usage[tenant_id]["tokens"] += total_tokens
            self.tenant_usage[tenant_id]["total_cost"] += cost
            self.tenant_usage[tenant_id]["total_latency"] += latency_ms

            return {
                "tenant_id": tenant_id,
                "tier": config.tier,
                "summary": summary,
                "token_usage": {
                    "prompt": prompt_tokens,
                    "completion": completion_tokens,
                    "total": total_tokens
                },
                "latency_ms": latency_ms,
                "sla_compliant": sla_compliant,
                "cost": cost
            }

    def print_usage_summary(self):
        """Print usage summary by tenant."""
        for tenant_id, usage in self.tenant_usage.items():
            print(f"\n{tenant_id}:")
            print(f"  Calls: {usage['count']}")
            print(f"  Total Tokens: {usage['tokens']}")
            print(f"  Total Cost: ${usage['total_cost']:.4f}")
            print(f"  Avg Latency: {usage['total_latency']/usage['count']:.0f}ms")

The key pattern here is calling set_tenant_id() early in the request lifecycle. This ensures all subsequent traces — including auto-instrumented OpenAI calls — are automatically attributed to the correct tenant.

Step 6: Test with Sample Meetings

Simulate meeting summarization requests from different tenants:

# Initialize summarizer
summarizer = MultiTenantMeetingSummarizer()

# Enterprise tier (Apex Legal) - legal meeting
sample_meeting = """
Attendees: John (Partner), Sarah (Associate), Mike (Paralegal)
Duration: 45 minutes
Topic: Case Strategy for Smith v. Jones

John: Let's discuss our approach for the Smith case. The deposition is in 3 weeks.
Sarah: I've reviewed the discovery documents. The key issue is the contract's ambiguity around the liability clause.
Mike: I've created a timeline. The critical events are on pages 45-67 of the evidence log.
John: Good. Sarah, can you draft a summary of our position by Friday?
Sarah: I'll have it ready. Should I include recommendations for discovery?
John: Yes, especially around vendor communications. Mike, check if we have all related emails.
Mike: I'll pull those by tomorrow.
John: This looks solid. Let's reconvene next week after Sarah finishes the draft.
"""

result1 = summarizer.summarize_meeting(
    tenant_id="apex-legal",
    meeting_transcript=sample_meeting,
    user_id="[email protected]"
)

print(f"Tier: {result1['tier']}")
print(f"SLA Compliant: {result1['sla_compliant']}")
print(f"Latency: {result1['latency_ms']:.0f}ms")
print(f"Tokens Used: {result1['token_usage']['total']}")

# Professional tier (Stratex Consulting) - strategy meeting
meeting_transcript_2 = """
Team sync for Q2 strategy planning.
Attendees: CEO, CFO, Head of Product

CEO: Let's review our market position and Q2 targets.
CFO: Revenue is up 15% YoY. We're tracking to beat forecast.
Head of Product: New features launched last month show strong adoption.
CEO: Great! What are our risks?
CFO: Supply chain delays could impact timeline.
Head of Product: We need to hire 3 more engineers to meet roadmap.
CEO: Let's make that happen. Budget approved.
"""

result2 = summarizer.summarize_meeting(
    tenant_id="stratex-consulting",
    meeting_transcript=meeting_transcript_2,
    user_id="[email protected]"
)

# Starter tier (TechStart Inc) - standup
meeting_transcript_3 = """
Daily standup
Attendees: Dev team

Tom: I finished the API integration yesterday.
Lisa: I'm working on the UI components.
Chris: Testing is on track for Thursday release.
Tom: Good. Any blockers?
Lisa: Waiting for design approval on the dashboard.
Chris: Should be done today.
"""

result3 = summarizer.summarize_meeting(
    tenant_id="techstart-inc",
    meeting_transcript=meeting_transcript_3,
    user_id="[email protected]"
)

Step 7: Review Usage and Cost Breakdown

Analyze per-tenant usage patterns and costs:

# Print usage summary
summarizer.print_usage_summary()

# Calculate estimated costs (rough approximation)
# GPT-4o-mini pricing (approximate): $0.15/1M input tokens, $0.60/1M output tokens
input_price_per_token = 0.15 / 1_000_000
output_price_per_token = 0.60 / 1_000_000

for tenant_id, usage in summarizer.tenant_usage.items():
    # Rough split: assume 70% input, 30% output tokens
    input_tokens = int(usage['tokens'] * 0.7)
    output_tokens = int(usage['tokens'] * 0.3)
    cost = (input_tokens * input_price_per_token) + (output_tokens * output_price_per_token)

    print(f"\n{tenant_id}:")
    print(f"  Total Tokens: {usage['tokens']}")
    print(f"  Estimated Cost: ${cost:.4f}")
    print(f"  Cost per Call: ${cost/usage['count']:.4f}")

Step 8: SLA Monitoring

Check which tenants are meeting their SLA commitments:

sla_results = [
    ("apex-legal", result1['sla_compliant'], result1['latency_ms']),
    ("stratex-consulting", result2['sla_compliant'], result2['latency_ms']),
    ("techstart-inc", result3['sla_compliant'], result3['latency_ms']),
]

for tenant_id, compliant, latency in sla_results:
    config = TENANT_CONFIGS[tenant_id]
    status = "PASS" if compliant else "FAIL"
    sla_text = f"{config.latency_sla_ms}ms" if config.latency_sla_ms else "Best effort"
    print(f"\n{tenant_id} ({config.tier}):")
    print(f"  SLA Target: {sla_text}")
    print(f"  Actual Latency: {latency:.0f}ms")
    print(f"  Status: {status}")

Setting Up Tenant-Specific Alerts

In the Netra dashboard, navigate to Alert Rules and create tenant-filtered alerts:

Create Alert Rule

Click Create Alert Rule and name it “Enterprise Latency SLA Breach”

Select Scope and Metric

Scope: Trace (monitor end-to-end requests)
Metric: Latency

Apply Tenant Filter

Add a filter for tenant_id = apex-legal to only monitor Enterprise tier requests

Set Threshold

Condition: Greater than 2000ms
Time Window: 5 minutes (to avoid alerting on single slow requests)

Configure Contact Point

Select your Slack channel or email for notifications

Create similar alerts for each tier with their respective SLA thresholds:

Tenant	Tier	Alert Threshold	Rate Limit
apex-legal	Enterprise	> 2000ms	60 calls/min
stratex-consulting	Professional	> 3000ms	30 calls/min
techstart-inc	Starter	No SLA alert (best effort)	10 calls/min

Step 9: Querying Tenant Metrics via Netra API

Once traces are sent to Netra, query tenant-specific metrics programmatically:

from datetime import datetime, timedelta, timezone

def get_tenant_usage_data(tenant_id: str, start_time: str, end_time: str):
    """Retrieve usage data for a tenant."""
    try:
        usage = Netra.usage.get_tenant_usage(
            tenant_id=tenant_id,
            start_time=start_time,
            end_time=end_time,
        )

        if usage:
            return {
                "tenant_id": usage.tenant_id,
                "token_count": usage.token_count,
                "request_count": usage.request_count,
                "session_count": usage.session_count,
                "total_cost": usage.total_cost,
            }
        return None
    except Exception as e:
        print(f"Error fetching usage for {tenant_id}: {e}")
        return None

# Example: Get usage for a tenant (Last 24 hours)
end_time = datetime.now(timezone.utc)
start_time = end_time - timedelta(days=1)

for tenant_id in TENANT_CONFIGS.keys():
    usage = get_tenant_usage_data(
        tenant_id=tenant_id,
        start_time=start_time.isoformat(),
        end_time=end_time.isoformat(),
    )
    if usage:
        print(f"{tenant_id}: {usage['request_count']} requests, ${usage['total_cost']:.4f}")
    else:
        print(f"{tenant_id}: No data returned (might be due to ingestion latency)")

What You’ll See in the Dashboard

After running this cookbook, check the Netra dashboard for:

Tenant selector filtering all traces to a specific customer
Per-tenant cost breakdown showing usage per customer
SLA compliance dashboard with latency metrics by tier
Comparative analytics showing which customers use which features
User activity filtered by tenant and user ID

Key Multi-Tenant Patterns

Pattern	Use Case	How to Implement
Cost attribution	Billing and profitability	Set `tenant_id` at request start
SLA monitoring	Support and escalation	Filter by `tenant_id` and latency threshold
Feature usage	Product insights	Check feature flags in tenant config
User segmentation	Per-user analytics	Set `user_id` in addition to `tenant_id`

Tracing RAG Pipeline

Single-tenant observability patterns

Tenants Documentation

Deep dive into tenant tracking features

Usage APIs

Query usage data programmatically

Alert Rules

Set up proactive monitoring

Observability

Evaluation

Open in Google Colab

What You’ll Learn

Set Tenant Context

Track Per-Customer Costs

Monitor SLA Compliance

Analyze Usage Patterns

The MeetingMind Scenario

Step 1: Install Packages

Step 2: Set Environment Variables

Step 3: Initialize Netra for Multi-Tenant Tracking

Step 4: Define Tenant Configuration

Step 5: Create Multi-Tenant Meeting Summarizer

Step 6: Test with Sample Meetings

Step 7: Review Usage and Cost Breakdown

Step 8: SLA Monitoring

Setting Up Tenant-Specific Alerts

Step 9: Querying Tenant Metrics via Netra API

What You’ll See in the Dashboard

Key Multi-Tenant Patterns

See Also

Tracing RAG Pipeline

Tenants Documentation

Usage APIs

Alert Rules

Observability

Evaluation

Open in Google Colab

​What You’ll Learn

Set Tenant Context

Track Per-Customer Costs

Monitor SLA Compliance

Analyze Usage Patterns

​The MeetingMind Scenario

​Step 1: Install Packages

​Step 2: Set Environment Variables

​Step 3: Initialize Netra for Multi-Tenant Tracking

​Step 4: Define Tenant Configuration

​Step 5: Create Multi-Tenant Meeting Summarizer

​Step 6: Test with Sample Meetings

​Step 7: Review Usage and Cost Breakdown

​Step 8: SLA Monitoring

​Setting Up Tenant-Specific Alerts

​Step 9: Querying Tenant Metrics via Netra API

​What You’ll See in the Dashboard

​Key Multi-Tenant Patterns

​See Also

Tracing RAG Pipeline

Tenants Documentation

Usage APIs

Alert Rules

What You’ll Learn

The MeetingMind Scenario

Step 1: Install Packages

Step 2: Set Environment Variables

Step 3: Initialize Netra for Multi-Tenant Tracking

Step 4: Define Tenant Configuration

Step 5: Create Multi-Tenant Meeting Summarizer

Step 6: Test with Sample Meetings

Step 7: Review Usage and Cost Breakdown

Step 8: SLA Monitoring

Setting Up Tenant-Specific Alerts

Step 9: Querying Tenant Metrics via Netra API

What You’ll See in the Dashboard

Key Multi-Tenant Patterns

See Also