👁 Blank white background with no objects or features visible.

TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report →

Join our VAR & VAD ecosystem — deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner →

Book Demo

👁 Three horizontal black bars of varying lengths on a white background, menu or list icon symbol.

👁 bg

👁 Blank white background with no objects or features visible in the empty space provided entirely.

Go back

👁 TrueFoundry Logo

Try TrueFoundry — Live, Right Now

Get instant access to a live TrueFoundry environment. Deploy models, route LLM traffic, and explore the full platform — your sandbox is ready in seconds, no credit card required.

9.9

👁 Red star symbol on white background, a five-pointed star icon in a blurry coral color.
👁 C2 logo with stylized orange letter and arrow symbol on a white background.

Loved by Enterprises and Startups

👁 Cargill logo with stylized gray swoosh above the company name on a white background.
👁 MAVENIR logo with stylized text and underline on the letter M in black on white background.
👁 Whatfix software logo with stylized letter W and trademark symbol on white background.
👁 Wadhwani AI logo featuring a stylized starburst design on a clean white background.
👁 Games logo with stylized sunburst design on white background.
👁 Grey Aviso logo featuring a stylized triangle with a dot on a white background.
👁 Aviva logo displayed on a white background with dark grey text and distinctive dot design element.
👁 JanitorAI Logo

Breaking Down AI Gateway Usage: Customer and User-Level Analytics

👁 Image

By Abhishek Choudhary

Published: March 16, 2026

👁 Image

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

As LLM usage scales across teams and features like chat, embedding, rerank, and real-time inference, token-based billing introduces cost complexity. Yet many organizations lack visibility into core questions like who uses the most tokens?, which features are the costliest?, and how usage is distributed across teams or customers?. Without detailed attribution, controlling spend or evaluating impact becomes difficult.

TrueFoundry changes the narrative by embedding metadata tagging directly into every LLM call. Whether you’re a multi-tenant SaaS provider tracking customer spend or an internal platform team monitoring feature consumption, TrueFoundry delivers a transparent view of usage data. Engineering, finance, and product stakeholders all gain instant access to detailed dashboards that map cost back to the right customer, team, or use case.

In this article, you’ll discover how granular tracking and cost attribution empower smarter decisions and unlock the full potential of your LLM investments.

How TrueFoundry Tracks LLM Usage and Costs

👁 Image

TrueFoundry provides detailed observability for every LLM request, enabling fine-grained cost attribution and usage analysis across teams, features, and customers. Each request is automatically logged with comprehensive metadata, including:

Model name
Timestamp
Input and output token counts
Temperature and max tokens
Latency and cost
Request type (e.g., chat, completion)
Custom metadata (e.g., tags)

Tracking LLM Usage Across Multiple Dimensions

👁 Image

When initializing the TrueFoundry client, developers can pass custom tags, such as customer_id, business_unit, or feature_name. These tags are stored alongside each request and are queryable via dashboards and APIs. This enables organizations to:

Attribute costs per tenant in a multi-tenant SaaS environment using customer_id
Track usage by business unit or department using organizational tags
Analyze token consumption by product feature, such as chatbots, recommendation engines, or analytics modules

‍

TrueFoundry LLM Usage Analytics:

Feeling in the dark about where your LLM spending and usage are going? TrueFoundry’s usage analytics shines a spotlight on every token and dollar, transforming uncertainty into actionable insights.

TrueFoundry equips you with:

Custom metadata tagging: Automatically tag each LLM request with fields like customer_id, business_unit, or feature_name for precise attribution.
Multi-dimensional usage breakdown: View usage and cost by model, user, team, or custom tag to identify high-consumption workloads at a glance.
Interactive dashboards: Access real-time graphs for requests, input/output tokens, latencies, error rates, and cost trends across all models.
Granular cost attribution: Drill into token counts, cost per request, and total spend per customer or feature to optimize budgets and show ROI.
Queryable analytics API: Export and query raw usage data or integrate with external BI tools for custom reporting, alerts, and deeper analysis.

Get Started with Truefoundry

Real-Time Insights and Optimization

👁 Image

Tagged metadata supports flexible filtering and grouping, allowing cross-functional teams to break down usage by any custom dimension. For example:

A product team can monitor which features generate the most token usage and correlate that with user engagement.
Finance teams can allocate costs precisely to internal teams or clients using tagged usage data.
Engineering leads can track performance and optimize high-cost prompts or services based on token and latency trends.

Benefits of Granular Attribution

Transparent Chargebacks: Enables automated, usage-based internal or external billing to drive accountability across teams or clients.
Improved ROI Analysis: Helps product and analytics teams evaluate the return on AI investment by mapping token usage to business outcomes.
Predictable Budgeting: Supports precise forecasting and budget enforcement with spend monitoring and alerting based on tag-level trends.

By combining deep request-level visibility with custom tagging, TrueFoundry enables organizations to operationalize LLM observability, cost control, and performance optimization in a scalable, transparent manner.

Key Metrics for Evaluating Gateway

Criteria	What should you evaluate ?	Priority	TrueFoundry
Latency	Adds <10ms p95 overhead for time-to-first-token?	Must Have	✅ Supported
Data Residency	Keeps logs within your region (EU/US)?	Depends on use case	✅ Supported
Latency-Based Routing	Automatically reroutes based on real-time latency/failures?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported

👁 Image

Evaluating an AI Gateway?

A practical guide used by platform & infra teams

Driving Strategic Actions with LLM Usage Analytics

TrueFoundry transforms detailed LLM usage data into actionable insights, enabling product, engineering, and finance teams to make informed decisions that optimize performance and control costs.

Strategic Decisions Enabled by Usage Breakdowns

Tiered Pricing Models

With comprehensive visibility into token consumption patterns, organizations can design pricing tiers that reflect actual usage. By analyzing historical data, teams can:

Set base plans aligned with average monthly token usage.
Offer discounted overage rates to customers who use tokens efficiently.
Introduce premium tiers for heavy users requiring larger quotas.

Example: A SaaS provider might establish a Standard tier capped at 200,000 tokens per month and a Professional tier at 1 million tokens. As customers' needs evolve, they can transition between tiers seamlessly, ensuring fair and predictable pricing.

User Quota Enforcement

TrueFoundry offers built-in support for enforcing usage quotas through its AI Gateway by leveraging rate limiting in AI gateway rules that control consumption across users, teams, and virtual accounts. This capability ensures that organizations can control consumption at multiple levels, preventing cost overruns and enabling safe experimentation.

Quotas can be applied to:

Individual Users
Example: Restrict bob@email.com to 1,000 requests per day.
Teams
Example: Limit the frontend team to 5,000 requests per day.
Virtual Accounts
Example: Cap the virtual account va-james at 1,500 requests per day.

These constraints are configured using a gateway-rate-limiting-config YAML file, where each rule defines the subject, threshold, and unit of measurement. Rules are evaluated in sequence, and the first applicable rule triggers enforcement.

Sample Configuration:

name: ratelimiting-config
type: gateway-rate-limiting-config
rules:
 - id: "rule-id"when:
 subjects: ["team:frontend"] # or ["user:email"] or ["virtualaccount:name"]
limit_to: 5000unit: requests_per_day

All matching rules are taken into account, and if any are exceeded, the corresponding rule ID is returned to the user, providing clarity on which quota was triggered.

This enforcement mechanism enables you to:

Prevent unexpected usage spikes by capping traffic at the user, team, or virtual account level.
Offer tiered plans with predefined limits for freemium or trial accounts.
Trigger alerts as thresholds approach, allowing stakeholders to take corrective action.

With quota enforcement configured at the gateway layer, TrueFoundry ensures fine-grained control without requiring changes to downstream models or infrastructure. This makes it ideal for running pilots, offering trials, and building scalable, cost-controlled multi-tenant AI services.

Identifying Under-Optimized Customers or Features

By combining cost data with performance metrics, TrueFoundry helps identify inefficiencies. These insights also help teams tune an LLM router, so requests can be directed toward the model that best balances latency, cost, and output quality. Teams can:

Flag customer segments or features with high token spend but low engagement.
Analyze prompt templates and workflows that drive excessive consumption.
Prioritize optimization efforts or refactor code paths to improve ROI.

Example: If a translation feature incurs high token costs without generating additional revenue, teams can iterate on model prompts or switch to a more efficient model to balance performance and price.

Cross-Functional Impact

Go-to-Market Teams

Sales and marketing teams leverage TrueFoundry’s usage reports to align value propositions with customer outcomes. They can:

Justify premium pricing by demonstrating how token usage correlates with business results.
Craft targeted upsell campaigns for accounts trending toward higher consumption.
Provide customers with transparent usage reports, building trust and reducing churn.

Finance and Operations

Finance teams gain forecasting accuracy by analyzing tagged usage trends over time. With this data, they can:

Project AI spend based on month-over-month growth rates.
Implement internal chargeback models to align costs with revenue centers.
Plan infrastructure capacity to match demand, avoiding both over-provisioning and performance bottlenecks.

By translating detailed usage breakdowns into clear, actionable insights, TrueFoundry empowers every team in an organization to optimize costs, improve feature performance, and scale AI initiatives with confidence.

Implementing Tagging and Usage Tracking in TrueFoundry

Implementing granular usage tracking with TrueFoundry involves three core steps: applying metadata tags on every call, integrating that data with your analytics or billing tools, and embedding best practices to align insights with business goals.

Implement Tagging and Usage Tracking

Tagging and metadata tracking in TrueFoundry enable granular observability into how LLM infrastructure is being used across environments, teams, features, and customers.

Add Metadata to LLM API Requests

TrueFoundry allows you to attach custom metadata to each LLM request using the X-TFY-METADATA header. This metadata is stored alongside each call and can be used for logging, filtering, and attribution.

Example:

metadata = {
"tfy_log_request": "true", # Enables request logging
"environment": "staging", # Tracks deployment environment
"feature": "countdown-bot" # Identifies the calling feature
}
client.chat.completions.create(
 # ... other parameters ...
 extra_headers={
"X-TFY-METADATA": '{"tfy_log_request":"true"}' }
)

This ensures that each API call carries rich context for analytics, cost attribution, and debugging.

Apply Tags to ML Runs

If you're using TrueFoundry’s ML platform for training or experimentation, you can tag each run to organize experiments by framework, task, or business objective.

Example:

import truefoundry.ml as tfm
client = tfm.get_client()
run = client.create_run(ml_repo="my-classification-project")
run.set_tags({"nlp.framework": "Spark NLP"})
run.end()

These tags help you categorize runs in dashboards, search past experiments, and enforce governance policies.

Best Practices for Tagging

Use consistent formats, such as snake_case for tag keys and values
Validate tag inputs via CI or pre-commit hooks
Audit and rotate outdated tags periodically to maintain clean logs

Integrate with Billing Dashboards and Analytics Tools

Once tagging is enabled, TrueFoundry provides multiple ways to visualize and analyze LLM usage across your organization. The built-in analytics dashboard offers real-time insights into token consumption, latency percentiles (P50, P90, P99), error rates, and costs. These metrics are broken down by user, model, and request type, allowing teams to monitor API health and identify high-cost or high-latency patterns quickly.

For advanced analysis, TrueFoundry supports integration with tools like Tableau, Looker, and Grafana. You can connect your usage dataset to build dashboards that highlight tokens per customer, cost per feature, and usage trends over time.

Finance and operations teams can export usage data through the Usage API into centralized data warehouses such as Snowflake, BigQuery, or Redshift. This enables chargeback reporting, comparison of AI spend across departments, and financial forecasting.

TrueFoundry also integrates with observability platforms, including Datadog, Prometheus, CloudWatch, and New Relic. These integrations provide unified monitoring of both system performance and LLM usage metrics.

Grafana users can create real-time dashboards that visualize CPU, GPU, and network utilization at the job or deployment level. This ensures full visibility across both model behavior and underlying infrastructure.

Align Data with Business Objectives

Raw metrics only become valuable when linked to meaningful business goals. With TrueFoundry’s tagging and observability capabilities, teams can define performance indicators that reflect actual value. Collaborate with product, finance, and analytics stakeholders to establish KPIs such as cost per engagement, tokens per conversion, or revenue generated per thousand tokens.

These KPIs should be embedded into business reviews, product roadmaps, and financial planning sessions to ensure LLM spend is aligned with strategic outcomes. Usage data can guide investment decisions, identify underperforming features, and highlight opportunities for model optimization.

Maintain a shared glossary of tags, features, and KPIs to help onboard new team members and avoid confusion across functions. Provide access to dashboards for teams beyond engineering, including sales, marketing, and support. This enables them to:

Monitor usage spikes or anomalies
Validate optimization efforts, such as prompt tuning that reduces token consumption
Propose and evaluate experiments, like switching to a smaller model for less critical use cases

When tied to clear goals, usage data becomes a strategic asset. By aligning tagging, tracking, and analysis with organizational priorities, TrueFoundry helps businesses scale LLM adoption responsibly while maximizing return on investment.

Conclusion

TrueFoundry transforms LLM usage from a hidden expense into a driver of innovation and growth. With every API call tagged by customer, team, or feature, your organization gains crystal-clear visibility into token spend and performance. Seamless integration with analytics and billing tools ensures finance and operations teams work with up-to-date data. By aligning usage metrics to business goals, product managers prioritize high-impact features, and engineering optimizes costly workflows. The result is smarter budgeting, clearer ROI, and faster decision-making across your entire organization. Adopt TrueFoundry’s granular usage breakdown today to unlock the full potential of your LLM investments.

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now