VOOZH about

URL: https://www.truefoundry.com/blog/truefoundry-llm-gateway-is-blazing-fast

⇱ Why the TrueFoundry LLM Gateway Is Blazing Fast


πŸ‘ Blank white background with no objects or features visible.

TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report β†’

Join our VAR & VAD ecosystem β€” deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner β†’

πŸ‘ logo
Sign Up
Login
πŸ‘ Three horizontal black bars of varying lengths on a white background, menu or list icon symbol.

Benchmarking the TrueFoundry LLM Gateway: it's blazing fast ⚑

πŸ‘ Image
By Srihari Radhakrishna

Published: January 19, 2026

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

  • Handles 350+ RPS on just 1 vCPU β€” no tuning needed
  • Production-ready with full enterprise support
  • TrueFoundry LLM Gateway provides a unified OpenAI compatible interface to various LLM providers like Anthropic, OpenAI, Bedrock, Gemini and many others
  • TrueFoundry LLM Gateway scales seamlessly to 350 RPS on a single replica of 1 unit CPU while using 270 MB of memory. We compared with another gateway product, LiteLLM, on a similar setup and LiteLLM failed to scaled beyond 50 RPS
  • TrueFoundry LLM Gateway only adds an extra latency of 3-5 ms, while LiteLLM adds between 15-30 ms per request.

Why does your org need an LLM Gateway?

An LLM Gateway provides a unified interface to manage your organisation's LLM usage:

  • Unified API: Access multiple LLM providers through a single OpenAI compatible interface, no code changes needed
  • API Key Security: Secure, centralised credential management
  • Governance & Control: Set limits, access controls, and content filtering
  • Rate Limiting: Prevent abuse and ensure fair usage
  • Observability: Track usage, costs, latency and performance
  • Load Balancing: Route requests across providers automatically
  • Cost Management: Monitor spending and set budget alerts
  • Audit Trails: Log all LLM interactions for compliance

How fast is TrueFoundry LLM Gateway?

Load Test Setup

For our load testing experiment, we setup a deployed this fake OpenAI endpoint service using TrueFoundry. The service would simulate OpenAI request and response format without actually producing tokens.

We also deployed the TrueFoundry LLM Gateway and LiteLLM Proxy Server, both running of a single replica with 1 unit CPU and 1 GB memory.  

We added our fake OpenAI provider into both TrueFoundry and LiteLLM gateways. While load testing, we made requests to the fake OpenAI server in 3 different ways:

  • Setup 1: Directly without using any proxy or gateway
  • Setup 2: Through the TrueFoundry LLM Gateway deployed on 1 unit CPU and 1 GB memory
  • Setup 3: Through the LiteLLM Proxy Server  deployed on 1 unit CPU and 1 GB memory
RPS 10 RPS 50 RPS 200 RPS 300 RPS
OpenAI direct (Setup 1) 73 ms 73 ms 73 ms 73 ms
TrueFoundry LLM Gateway (Setup 2) 76 ms (+3 ms) 76 ms (+3 ms) 76 ms (+3 ms) 77 ms (+4 ms)
LiteLLM Proxy (Setup 3) 88 ms (+15 ms) 99 ms (+26 ms) Could not scale to 200 RPS Could not scale to 300 RPS

Observations

  1. TrueFoundry Gateway adds only extra 3 ms in latency upto 250 RPS and 4 ms at RPS > 300
  2. TrueFoundry LLM Gateway was able to scale without any degradation in performance until about 350 RPS (1 vCPU, 1 GB machine) before the CPU utilisation reached 100% and latencies started getting affected. With more CPU or more replicas, the LLM Gateway can scale to tens of thousands of requests per second.
  3. LiteLLM on the same machine was not able to scale beyond 40-50 RPS before reaching CPU limit

More metrics

Setup 1: Direct OpenAI endpoint calling

Stats @ 200 RPS
Stats @ 300 RPS
Response Time v/s RPS

‍Setup 2: TrueFoundry LLM Gateway

Stats @ 200 RPS
Stats @ 300 RPS
Response Time v/s RPS

Setup 3: LiteLLM

Stats @ ~58 RPS
Response times v/s RPS

Speed features of LLM Gateway

  • Near-Zero Overhead: Just 3-5 ms added latency
  • Optimised Backend: Built with performant Node.js framework
  • Config Caching: Config is stored in memory for quick look up
  • Smart Routing: Minimal processing overhead
  • Edge Ready: Deploy close to your apps
  • High Capacity: A t2.2xlarge AWS instance (43$ per month on spot) machine can scale upto ~3000 RPS with no issues.
Edge Deployment of TrueFoundry LLM Gateway

Supported Providers

Below is a comprehensive list of popular LLM providers that is supported by TrueFoundry LLM Gateway:

Provider Streaming Supported
GCP βœ…
AWS βœ…
Azure OpenAI βœ…
Self Hosted Models on TrueFoundry βœ…
OpenAI βœ…
Cohere βœ…
AI21 βœ…
Anthropic βœ…
Anyscale βœ…
Together AI βœ…
DeepInfra βœ…
Ollama βœ…
Palm βœ…
Perplexity AI βœ…
Mistral AI βœ…
Groq βœ…
Nomic βœ…

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

The fastest way to build, govern and scale your AI

Sign Up
Gartner Hype Cycle for Platform Engineering 2026
πŸ‘ Image

One Layer of Control for All AI

Route and govern model and tool traffic with a centralized AI Gateway
Table of Contents
πŸ‘ logo

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

The fastest way to build, govern and scale your AI

Book Demo

Discover More

πŸ‘ Image
November 13, 2025
|
5 min read

GPT-5.1 vs GPT-5: 9 Major Improvements You Need to Know

πŸ‘ Image
August 27, 2025
|
5 min read

Mapping the On-Prem AI Market: From Chips to Control Planes

πŸ‘ Image
August 27, 2025
|
5 min read

AI Gateways: From Outage Panic to Enterprise Backbone

πŸ‘ Image
April 16, 2024
|
5 min read

Cognita: Building an Open Source, Modular, RAG applications for Production

πŸ‘ Image
June 19, 2026
|
5 min read

Governing Multi-Agent Systems: Agent Identity, A2A, and the Agent Gateway

No items found.
πŸ‘ Image
June 19, 2026
|
5 min read

TOKENMAXXING TRILOGY Β· PART 2 OF 3: The Architecture of Governed AI Usage

No items found.
πŸ‘ Image
June 19, 2026
|
5 min read

Grok 4.3 on Amazon Bedrock: We Routed Four Frontier Models Through One Gateway and Measured the Cost

LLM Tools
comparison
πŸ‘ Image
June 19, 2026
|
5 min read

Top 5 LiteLLM Alternatives for Enterprises in 2026

No items found.
No items found.

Recent Blogs

Governing Multi-Agent Systems: Agent Identity, A2A, and the Agent Gateway

June 19, 2026

Boyu Wang

Grok 4.3 on Amazon Bedrock: We Routed Four Frontier Models Through One Gateway and Measured the Cost

June 19, 2026

Amrutha Potluri

JIT Context: Why the Best Agents Load Late and Load Little

June 18, 2026

Boyu Wang

Best AI Cost Optimization Tools in 2026: Compared for Enterprise Teams

June 18, 2026

Ashish Dubey

AI Cost Optimization Strategies in 2026: A Practical Guide for Enterprise Teams

June 18, 2026

Ashish Dubey

Claude MCP Registry: A Complete Guide for Developers and Enterprise Teams

June 17, 2026

Ashish Dubey

AI Policy Enforcement: A Complete Guide for Enterprise Teams

June 17, 2026

Ashish Dubey

AI Utility: A Complete Guide to AI in Energy and Utilities for 2026

June 17, 2026

Ashish Dubey

10 Best Shadow AI Detection Tools for 2026: Compared for Enterprise Security Teams

June 18, 2026

Ashish Dubey

Field Notes: When AI Cost Control Becomes a Switch β€” and Why It Should Be a Gateway

June 17, 2026

Boyu Wang

What Is AI Orchestration? A Complete Guide

June 16, 2026

Ashish Dubey

Best Multi-Agent Orchestration Tools in 2026: Compared for Enterprise and Developer Teams

June 16, 2026

Ashish Dubey

Multi-agent Orchestration Frameworks in 2026: Compared for Enterprise Teams

June 16, 2026

Ashish Dubey

The Claude Fable 5 / Mythos 5 Ban and Why You Need a Multi-Provider AI Gateway

June 16, 2026

Ashish Dubey

What Is Multi-Model Orchestration? A Practical Guide for Enterprise Teams

June 16, 2026

Ashish Dubey

Take a quick product tour
Start Product Tour
Product Tour

Β© 2026 All rights reserved.

πŸ‘ Github icon
πŸ‘ LinkedIn Icon
πŸ‘ Blurry blue crisscross lines on white background forming an X shape with dotted lines.
πŸ‘ LinkedIn logo for social media link