![]() |
VOOZH | about |
TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report β
Join our VAR & VAD ecosystem β deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner β
Get instant access to a live TrueFoundry environment. Deploy models, route LLM traffic, and explore the full platform β your sandbox is ready in seconds, no credit card required.
Blazingly fast way to build, track and deploy your models!
As LLM usage scales across teams and features like chat, embedding, rerank, and real-time inference, token-based billing introduces cost complexity. Yet many organizations lack visibility into core questions like who uses the most tokens?, which features are the costliest?, and how usage is distributed across teams or customers?. Without detailed attribution, controlling spend or evaluating impact becomes difficult.
TrueFoundry changes the narrative by embedding metadata tagging directly into every LLM call. Whether youβre a multi-tenant SaaS provider tracking customer spend or an internal platform team monitoring feature consumption, TrueFoundry delivers a transparent view of usage data. Engineering, finance, and product stakeholders all gain instant access to detailed dashboards that map cost back to the right customer, team, or use case.
In this article, youβll discover how granular tracking and cost attribution empower smarter decisions and unlock the full potential of your LLM investments.
TrueFoundry provides detailed observability for every LLM request, enabling fine-grained cost attribution and usage analysis across teams, features, and customers. Each request is automatically logged with comprehensive metadata, including:
When initializing the TrueFoundry client, developers can pass custom tags, such as customer_id, business_unit, or feature_name. These tags are stored alongside each request and are queryable via dashboards and APIs. This enables organizations to:
β
Feeling in the dark about where your LLM spending and usage are going? TrueFoundryβs usage analytics shines a spotlight on every token and dollar, transforming uncertainty into actionable insights.
TrueFoundry equips you with:
Tagged metadata supports flexible filtering and grouping, allowing cross-functional teams to break down usage by any custom dimension. For example:
By combining deep request-level visibility with custom tagging, TrueFoundry enables organizations to operationalize LLM observability, cost control, and performance optimization in a scalable, transparent manner.
Key Metrics for Evaluating Gateway
| Criteria | What should you evaluate ? | Priority | TrueFoundry |
|---|---|---|---|
| Latency | Adds <10ms p95 overhead for time-to-first-token? | Must Have | β Supported |
| Data Residency | Keeps logs within your region (EU/US)? | Depends on use case | β Supported |
| Latency-Based Routing | Automatically reroutes based on real-time latency/failures? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
TrueFoundry transforms detailed LLM usage data into actionable insights, enabling product, engineering, and finance teams to make informed decisions that optimize performance and control costs.
Strategic Decisions Enabled by Usage Breakdowns
Tiered Pricing Models
With comprehensive visibility into token consumption patterns, organizations can design pricing tiers that reflect actual usage. By analyzing historical data, teams can:
Example: A SaaS provider might establish a Standard tier capped at 200,000 tokens per month and a Professional tier at 1 million tokens. As customers' needs evolve, they can transition between tiers seamlessly, ensuring fair and predictable pricing.
User Quota Enforcement
TrueFoundry offers built-in support for enforcing usage quotas through its AI Gateway by leveraging rate limiting in AI gateway rules that control consumption across users, teams, and virtual accounts. This capability ensures that organizations can control consumption at multiple levels, preventing cost overruns and enabling safe experimentation.
Quotas can be applied to:
These constraints are configured using a gateway-rate-limiting-config YAML file, where each rule defines the subject, threshold, and unit of measurement. Rules are evaluated in sequence, and the first applicable rule triggers enforcement.
Sample Configuration:
name: ratelimiting-config
type: gateway-rate-limiting-config
rules:
- id: "rule-id"when:
subjects: ["team:frontend"] # or ["user:email"] or ["virtualaccount:name"]
limit_to: 5000unit: requests_per_day
All matching rules are taken into account, and if any are exceeded, the corresponding rule ID is returned to the user, providing clarity on which quota was triggered.
This enforcement mechanism enables you to:
With quota enforcement configured at the gateway layer, TrueFoundry ensures fine-grained control without requiring changes to downstream models or infrastructure. This makes it ideal for running pilots, offering trials, and building scalable, cost-controlled multi-tenant AI services.
Identifying Under-Optimized Customers or Features
By combining cost data with performance metrics, TrueFoundry helps identify inefficiencies. These insights also help teams tune an LLM router, so requests can be directed toward the model that best balances latency, cost, and output quality. Teams can:
Example: If a translation feature incurs high token costs without generating additional revenue, teams can iterate on model prompts or switch to a more efficient model to balance performance and price.
Cross-Functional Impact
Go-to-Market Teams
Sales and marketing teams leverage TrueFoundryβs usage reports to align value propositions with customer outcomes. They can:
Finance and Operations
Finance teams gain forecasting accuracy by analyzing tagged usage trends over time. With this data, they can:
By translating detailed usage breakdowns into clear, actionable insights, TrueFoundry empowers every team in an organization to optimize costs, improve feature performance, and scale AI initiatives with confidence.
Implementing granular usage tracking with TrueFoundry involves three core steps: applying metadata tags on every call, integrating that data with your analytics or billing tools, and embedding best practices to align insights with business goals.
Implement Tagging and Usage Tracking
Tagging and metadata tracking in TrueFoundry enable granular observability into how LLM infrastructure is being used across environments, teams, features, and customers.
Add Metadata to LLM API Requests
TrueFoundry allows you to attach custom metadata to each LLM request using the X-TFY-METADATA header. This metadata is stored alongside each call and can be used for logging, filtering, and attribution.
Example:
metadata = {
"tfy_log_request": "true", # Enables request logging
"environment": "staging", # Tracks deployment environment
"feature": "countdown-bot" # Identifies the calling feature
}
client.chat.completions.create(
# ... other parameters ...
extra_headers={
"X-TFY-METADATA": '{"tfy_log_request":"true"}' }
)
This ensures that each API call carries rich context for analytics, cost attribution, and debugging.
Apply Tags to ML Runs
If you're using TrueFoundryβs ML platform for training or experimentation, you can tag each run to organize experiments by framework, task, or business objective.
Example:
import truefoundry.ml as tfm
client = tfm.get_client()
run = client.create_run(ml_repo="my-classification-project")
run.set_tags({"nlp.framework": "Spark NLP"})
run.end()
These tags help you categorize runs in dashboards, search past experiments, and enforce governance policies.
Best Practices for Tagging
Integrate with Billing Dashboards and Analytics Tools
Once tagging is enabled, TrueFoundry provides multiple ways to visualize and analyze LLM usage across your organization. The built-in analytics dashboard offers real-time insights into token consumption, latency percentiles (P50, P90, P99), error rates, and costs. These metrics are broken down by user, model, and request type, allowing teams to monitor API health and identify high-cost or high-latency patterns quickly.
For advanced analysis, TrueFoundry supports integration with tools like Tableau, Looker, and Grafana. You can connect your usage dataset to build dashboards that highlight tokens per customer, cost per feature, and usage trends over time.
Finance and operations teams can export usage data through the Usage API into centralized data warehouses such as Snowflake, BigQuery, or Redshift. This enables chargeback reporting, comparison of AI spend across departments, and financial forecasting.
TrueFoundry also integrates with observability platforms, including Datadog, Prometheus, CloudWatch, and New Relic. These integrations provide unified monitoring of both system performance and LLM usage metrics.
Grafana users can create real-time dashboards that visualize CPU, GPU, and network utilization at the job or deployment level. This ensures full visibility across both model behavior and underlying infrastructure.
Align Data with Business Objectives
Raw metrics only become valuable when linked to meaningful business goals. With TrueFoundryβs tagging and observability capabilities, teams can define performance indicators that reflect actual value. Collaborate with product, finance, and analytics stakeholders to establish KPIs such as cost per engagement, tokens per conversion, or revenue generated per thousand tokens.
These KPIs should be embedded into business reviews, product roadmaps, and financial planning sessions to ensure LLM spend is aligned with strategic outcomes. Usage data can guide investment decisions, identify underperforming features, and highlight opportunities for model optimization.
Maintain a shared glossary of tags, features, and KPIs to help onboard new team members and avoid confusion across functions. Provide access to dashboards for teams beyond engineering, including sales, marketing, and support. This enables them to:
When tied to clear goals, usage data becomes a strategic asset. By aligning tagging, tracking, and analysis with organizational priorities, TrueFoundry helps businesses scale LLM adoption responsibly while maximizing return on investment.
TrueFoundry transforms LLM usage from a hidden expense into a driver of innovation and growth. With every API call tagged by customer, team, or feature, your organization gains crystal-clear visibility into token spend and performance. Seamless integration with analytics and billing tools ensures finance and operations teams work with up-to-date data. By aligning usage metrics to business goals, product managers prioritize high-impact features, and engineering optimizes costly workflows. The result is smarter budgeting, clearer ROI, and faster decision-making across your entire organization. Adopt TrueFoundryβs granular usage breakdown today to unlock the full potential of your LLM investments.
TrueFoundry AI Gateway delivers ~3β4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
Product
Company
Resources