AI Gateway

Connectivity and governance layer for modern AI-native applications built on top of Kong Gateway

Introducing AI Gateway

As AI adoption accelerates, applications are evolving beyond basic LLM calls into complex, multi-actor systems-including user apps, agents, orchestration layers, and context servers that interact with foundation models in real time.

To support this shift, developers are adopting protocols like Model Context Protocol (MCP) and Agent2Agent (A2A) to standardize how components exchange tools, data, and decisions.

But infrastructure often falls behind, with challenges around authentication, rate limiting, data security, observability, and constant provider changes.

AI Gateway addresses these challenges with a high-performance control plane that secures, governs, and observes AI-native systems end to end. Whether serving LLM traffic, exposing structured context via MCP, or coordinating agents through A2A, AI Gateway ensures scalable, secure, and reliable AI infrastructure.

👁 Overview of AI gateway

Quickstart

Or, launch a demo instance of AI Gateway running on-prem:

curl -Ls https://get.konghq.com/ai | bash

Copied!

Get started

Run the Kong Gateway quickstart and enable the AI Proxy plugin.

Video tutorials

Learn how to use AI plugins with video tutorials.

AI plugins

Learn about all the AI plugins.

Cookbooks

End-to-end recipes for building real-world AI scenarios.

AI Gateway providers

Kong AI Gateway routes AI requests to various providers through a provider-agnostic API. This normalized API layer provides multiple benefits: client applications stay decoupled from provider-specific APIs, credentials are managed centrally, and request routing can be dynamic to optimize for cost, latency, or availability.

👁 Image

OpenAI

👁 Image

Anthropic

👁 Image

Azure AI

👁 Image

More...

AI Gateway in Konnect

Konnect provides a unified control plane to create, manage, and monitor LLMs using the Konnect platform.

Key features include:

Routing and load balancing: Assign Gateway Services and define how traffic is distributed across models.
Streaming and authentication: Enable streaming responses and manage authentication through the AI Gateway.
Access control: Create and apply access tiers to control how clients interact with LLMs.
Usage analytics: Monitor request and token volumes, track error rates, and measure average latency with historical comparisons.
Visual traffic maps: Explore interactive maps that show how requests flow between clients and models in real time.

👁 {{site.ai_gateway}} Dashboard in Konnect

Deployment checklist

AI Gateway resource sizing guidelines: Review recommended resource allocation guidelines for AI Gateway.
Deployment topologies: Learn about the different ways to deploy Kong Gateway.
Hosting options: Decide where you want to host your Data Plane nodes, and whether you want Kong to host them or host them yourself.

Tools to manage AI Gateway

AI Gateway editor: GUI for managing all your AI Gateway resources in one place.
decK: Manage AI Gateway and Kong Gateway configuration through declarative state files.
Terraform: Manage infrastructure as code and automated deployments to streamline setup and configuration of Konnect and Kong Gateway.
KIC: Manage ingress traffic and routing rules for your services.
Kong Gateway Admin API: Manage on-prem Kong Gateway entities via an API.
Control Plane Config API: Manage Kong Gateway entities within Konnect Control Planes via an API.

AI Gateway capabilities

You can enable the AI Gateway features through a set of modern and specialized plugins, using the same model you use for any other Kong Gateway plugin. When deployed alongside existing Kong Gateway plugins, Kong Gateway users can quickly assemble a sophisticated AI management platform without custom code or deploying new and unfamiliar tools.

Universal API

Route client requests to various AI providers

👁 Image

Rate limiting

Manage traffic to your LLM API

👁 Image

Semantic caching

Semantically cache responses from LLMs

👁 Image

Semantic routing

Semantically distribute requests to different LLM models

MCP traffic gateway

Gain control and visibility over AI agent infrastructure with AI Gateway-driven MCP capabilities

👁 Image

A2A traffic gateway

Secure, govern, and observe agent-to-agent (A2A) traffic with AI Gateway’s A2A protocol support

👁 Image

Automated RAG injection

Automatically embed RAG logic into your workflows

Data governance

Use AI plugins to control AI data and usage

Guardrails

Inspect requests and configure content safety and moderation

Prompt engineering

Create prompt templates and manipulate client prompts

Load balancing

Learn about the load balancing algorithms available for AI Gateway

Audit log

Learn about AI Gateway logging capabilities

LLM metrics

Expose and visualize LLM metrics

Konnect Observability

Visualize LLM metrics in Konnect.

Metering & Billing

Meter LLM usage with Konnect.

Streaming

Stream user requests with AI Gateway

Secrets management

Use Konnect Config Store to store and reference your LLM provider API keys

LLM cost control

Reduce LLM usage costs by giving you control over how prompts are built and routed

👁 Image

Request transformations

Use AI to transform requests and responses

👁 Image

Canary release

Slowly roll out software changes to a subset of users.

Proxy AI CLI tools through AI Gateway

Configure AI Gateway to proxy requests from AI command-line tools to LLM providers

Universal API

Kong’s AI Gateway Universal API, delivered through the AI Proxy and AI Proxy Advanced plugins, simplifies AI model integration by providing a single, standardized interface for interacting with models across multiple providers.

Easy to use: Configure once and access any AI model with minimal integration effort.
Load balancing: Automatically distribute AI requests across multiple models or providers for optimal performance and cost efficiency.
Retry and fallback: Optimize AI requests based on model performance, cost, or other factors.
Cross-plugin integration: Leverage AI in non-AI API workflows through other Kong Gateway plugins.

👁 Overview of AI gateway

👁 Image

AI Proxy

The AI Proxy plugin lets you transform and proxy requests to a number of AI providers and models.

See plugin →

👁 Image

AI Proxy Advanced

The AI Proxy Advanced plugin lets you transform and proxy requests to multiple AI providers and models at the same time. This lets you set up load balancing between targets.

See plugin →

AI usage governance

As AI technologies see broader adoption, developers and organizations face new risks: the risk of sensitive data leaking to AI providers, which exposes businesses and their customers to potential breaches and security threats.

Managing how data flows to and from AI models has become critical not just for security, but also for compliance and reliability. Without the right controls in place, organizations risk losing visibility into how AI is used across their systems.

AI Gateway helps mitigate these challenges by offering a suite of plugins that extend beyond basic AI traffic management.

Data governance: Control how sensitive information is handled and shared with AI models.
Prompt engineering: Customize and optimize prompts to deliver consistent, high-quality AI outputs.
Guardrails and content safety: Enforce policies to prevent inappropriate, unsafe, or non-compliant responses.
Automated RAG injection: Seamlessly inject relevant, vetted data into AI prompts without manual RAG implementations.
Load balancing: Distribute AI traffic efficiently across multiple model endpoints to ensure performance and reliability.
LLM cost control: Use the AI Compressor, RAG Injector, and Prompt Decorator to compress and structure prompts efficiently. Combine with AI Proxy Advanced to route requests across OpenAI models by semantic similarity—optimizing for cost and performance.

Data governance

AI Gateway enforces governance on outgoing AI prompts through allow/deny lists, blocking unauthorized requests with 4xx responses. It also provides built-in PII sanitization, automatically detecting and redacting sensitive data across 20 categories and 9 languages. Running privately and self-hosted for full control and compliance, AI Gateway ensures consistent protection without burdening developers, which helps simplify AI adoption at scale.

For more information, see the full list of Data Governance capabilities.

👁 Image

AI Prompt Guard

Check text completion requests against a list of allowed or denied expressions

See plugin →

👁 Image

AI Semantic Prompt Guard

Semantically and intelligently create allow and deny lists of topics that can be requested across every LLM.

See plugin →

👁 Image

AI PII Sanitizer

Protect sensitive information in client request or response bodies before they reach upstream services or clients

See plugin →

Prompt engineering

AI systems are built around prompts, and manipulating those prompts is important for successful adoption of the technologies. Prompt engineering is the methodology of manipulating the linguistic inputs that guide the AI system. AI Gateway supports a set of plugins that allow you to create a simplified and enhanced experience by setting default prompts or manipulating prompts from clients as they pass through the gateway.

👁 Image

AI Prompt Template

Provide fill-in-the-blank AI prompts to users

See plugin →

👁 Image

AI Prompt Decorator

Prepend or append an array of llm/v1/chat messages to a user’s chat history

See plugin →

Guardrails and content safety

As a platform owner, you may need to moderate all user request content against reputable services to comply with specific sensitive categories when proxying Large Language Model (LLM) traffic. AI Gateway provides built-in capabilities to handle content moderation and ensure content safety, that help you enforce compliance and protect your users across AI-powered applications.

👁 Image

AI Azure Content Safety

Use Azure AI Content Safety to check and audit AI Proxy plugin messages before proxying them to an upstream LLM

See plugin →

👁 Image

AI AWS Guardrails

Use AWS Guardrails to validate requests and/or responses in the AI Proxy plugin before forwarding them between clients and upstream LLMs.

See plugin →

👁 Image

AI GCP Model Armor

Audit and validate LLM prompts with Google Cloud Model Armor before forwarding them to an upstream LLM.

See plugin →

👁 Image

AI Semantic Prompt Guard

Semantically and intelligently create allow and deny lists of topics that can be requested across every LLM.

See plugin →

👁 Image

AI Semantic Response Guard

Permit or block prompts based on semantic similarity to known LLM responses, preventing misuse of llm/v1/chat or llm/v1/completions requests

See plugin →

👁 Image

AI Lakera Guard

Inspect and enforce Lakera Guard safety policies on LLM requests and responses before they reach upstream models.

See plugin →

👁 Image

AI Custom Guardrail

Use a third-party guardrails service to validate requests and/or responses in the AI Proxy plugin before forwarding them between clients and upstream LLMs

See plugin →

Amazon Bedrock guardrails

Include your Amazon Bedrock guardrails configuration in AI Proxy requests

Request transformations

AI Gateway allows you to use AI technology to augment other API traffic. One example is routing API responses through an AI language translation prompt before returning it to the client. AI Gateway provides two plugins that can be used in conjunction with other upstream API services to weave AI capabilities into API request processing. These plugins can be configured independently of the AI Proxy plugin.

👁 Image

AI Request Transformer

Use an LLM service to transform a client request body prior to proxying the request to the upstream server

See plugin →

👁 Image

AI Response Transformer

Use an LLM service to transform the upstream HTTP(S) prior to forwarding it to the client

See plugin →

Automated RAG

LLMs are only as reliable as the data they can access. When faced with incomplete information, they often produce confident yet incorrect responses known as “hallucinations.” These hallucinations occur when LLMs lack the necessary domain knowledge. To address this, developers use the Retrieval-augmented Generation (RAG) approach, which enriches models with relevant data pulled from vector databases.

While standard RAG workflows are resource-heavy, as they require teams to generate embeddings and manually curate them in vector databases, Kong’s AI RAG Injector plugin automates this entire process. Instead of embedding RAG logic into every application individually, platform teams can inject vetted data into prompts directly at the gateway layer without any manual interventions.

👁 Image

AI RAG Injector

Create RAG pipelines by automatically injecting content from a vector database

See plugin →

Load balancing

AI Gateway’s load balancer routes requests across AI models to optimize for speed, cost, and reliability. It supports algorithms like consistent hashing, lowest-latency, usage-based, round-robin, and semantic matching, with built-in retries and fallback for resilience v3.10+.

The balancer dynamically selects models based on real-time performance and prompt relevance, and works across mixed environments including OpenAI, Mistral, and Llama models.

Load balancing

Learn about the load balancing algorithms available for AI Gateway.

Retry and fallback

Learn about how AI Gateway load balancers handle retry and fallback.

LLM cost control

The AI Gateway helps reduce LLM usage costs by giving you control over how prompts are built and routed. You can compress and structure prompts efficiently using the AI Compressor, RAG Injector, and AI Prompt Decorator plugins. For further savings, you can use AI Proxy Advanced to route requests across OpenAI models based on semantic similarity.

👁 Image

AI Prompt Compressor

Compress prompts before they are sent to LLMs to reduce costs, and improve latency

See plugin →

Meter, bill, and monetize the entire AI connectivity data path

Track LLM token usage across models and prompt types for accurate billing and cost control. Create pricing plans based on input, output, and system token consumption, then automate invoicing with Stripe or ERP integrations.

Save LLM usage costs with semantic load balancing

Use semantic load balancing to optimize LLM usage and reduce costs by intelligently routing chat requests across multiple OpenAI models based on semantic similarity.

Observability and metrics

AI Gateway provides multiple approaches to monitor LLM traffic and operations. Track token usage, latency, and costs through audit logs and metrics exporters. Instrument request flows with OpenTelemetry to trace prompts and responses across your infrastructure. Use Konnect Advanced Analytics for pre-built dashboards, or integrate with your existing observability stack.

Audit log

Learn about AI Gateway logging capabilities.

Konnect Observability

Visualize LLM metrics in Konnect.

LLM metrics

Expose and visualize LLM metrics.

Gen AI OTLP span attributes

Per-request OpenTelemetry span attributes for AI traffic.

Gen AI OTLP metrics

Aggregated OpenTelemetry metrics for AI, MCP, and A2A traffic.

How-to Guides

Frequently Asked Questions

Is AI Gateway available for all deployment modes?

Yes, AI plugins are supported in all deployment modes, including Konnect, self-hosted traditional, hybrid, and DB-less, and on Kubernetes via the Kong Ingress Controller.

Why should I use AI Gateway instead of adding the LLM’s API behind Kong Gateway?

If you just add an LLM’s API behind Kong Gateway, you can only interact at the API level with internal traffic. With AI plugins, Kong Gateway can understand the prompts that are being sent through the gateway. The plugins can inspect the body and provide more specific AI capabilities to your traffic.

URL: https://developer.konghq.com/ai-gateway/

AI Gateway

Get started

Video tutorials

AI plugins

Cookbooks

OpenAI

Anthropic

Azure AI

More...

Universal API

Rate limiting

Semantic caching

Semantic routing

MCP traffic gateway

A2A traffic gateway

Automated RAG injection

Data governance

Guardrails

Prompt engineering

Load balancing

Audit log

LLM metrics

Konnect Observability

Metering & Billing

Streaming

Secrets management

LLM cost control

Request transformations

Canary release

Proxy AI CLI tools through AI Gateway

AI Proxy

AI Proxy Advanced

AI Prompt Guard

AI Semantic Prompt Guard

AI PII Sanitizer

AI Prompt Template

AI Prompt Decorator

AI Azure Content Safety

AI AWS Guardrails

AI GCP Model Armor

AI Semantic Prompt Guard

AI Semantic Response Guard

AI Lakera Guard

AI Custom Guardrail

Amazon Bedrock guardrails

AI Request Transformer

AI Response Transformer

AI RAG Injector

Load balancing

Retry and fallback

AI Prompt Compressor

Meter, bill, and monetize the entire AI connectivity data path

Save LLM usage costs with semantic load balancing

Audit log

Konnect Observability

LLM metrics

Gen AI OTLP span attributes

Gen AI OTLP metrics

Help us make these docs great!

Still need help?