Rate limit how many HTTP requests can be made in a given time frame using multiple rate limits and window sizes, and applying sliding windows. This plugin is a more advanced version of the Rate Limiting plugin, which only allows one fixed rate limiting window.
If the underlying Gateway Service or Route has no authentication layer, the client IP address is used for identifying clients. Otherwise, the Consumer is used if an authentication plugin has been configured.
Advanced features of this plugin include:
- Sliding window support, which provides better performance than fixed rate limiting
- Multiple limits and window sizes
- Support for Redis Sentinel, Redis cluster, and Redis SSL
- Control over which requests contribute to incrementing the rate limiting counters via the
config.disable_penaltyparameter
Kong also provides multiple specialized rate limiting plugins, including rate limiting across LLMs and GraphQL queries. See Rate limiting in Kong Gateway to choose the plugin that is most useful in your use case.
Window types
The Rate Limiting Advanced plugin supports the following window types:
- Fixed window: Fixed windows consist of buckets that are statically assigned to a definitive time range. Each request is mapped to only one fixed window based on its timestamp and will affect only that window’s counters.
- Sliding window (default): A sliding window tracks the number of hits assigned to a specific key (such as an IP address, consumer, credential) within a given time window, taking into account previous hit rates to create a dynamically calculated rate. The default (and recommended) sliding window type ensures a resource is not consumed at a higher rate than what is configured.
Learn more about how the different window types work for rate limiting plugins.
Multiple limits and window sizes
An arbitrary number of limits or window sizes can be applied per plugin instance. This allows you to create multiple rate limiting windows (for example, rate limit per minute and per hour, and per any arbitrary window size). Because of limitations with Kong Gateway’s plugin configuration interface, each nth limit will apply to each nth window size. For example:
_format_version: "3.0"
plugins:
- name: rate-limiting-advanced
config:
limit:
- 10
- 100
window_size:
- 60
- 3600curl -i -X POST http://localhost:8001/plugins/ \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data '
{
"name": "rate-limiting-advanced",
"config": {
"limit": [
10,
100
],
"window_size": [
60,
3600
]
}
}
'Make the following request:
curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
--header "accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $KONNECT_TOKEN" \
--data '
{
"name": "rate-limiting-advanced",
"config": {
"limit": [
10,
100
],
"window_size": [
60,
3600
]
}
}
'echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name:
namespace: kong
annotations:
kubernetes.io/ingress.class: kong
labels:
global: 'true'
config:
limit:
- 10
- 100
window_size:
- 60
- 3600
plugin: rate-limiting-advanced
" | kubectl apply -f -resource "konnect_gateway_plugin_rate_limiting_advanced" "my_rate_limiting_advanced" {
enabled = true
config = {
limit = [10, 100]
window_size = [60, 3600]
}
control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}This example applies two rate limiting policies, one of which will trip when 10 hits have been counted in 60 seconds, or the other when 100 hits have been counted in 3600 seconds.
The number of configured window sizes and limits parameters must be equal, otherwise you will get the following error:
You must provide the same number of windows and limitsNamespace
The namespace field is auto-generated for the plugin instance. It’s optional when configuring the plugin through API commands or decK.
If you are managing Kong Gateway with decK or running Kong Gateway in DB-less mode, set the namespace explicitly in your declarative configuration. Otherwise the field will be regenerated automatically with every update.
Strategies
The Rate Limiting Advanced plugin supports three rate limiting strategies: local, cluster, and redis.
This is controlled by the config.strategy parameter.
|
Strategy |
Description |
Pros |
Cons |
|---|---|---|---|
local
|
Counters are stored in-memory on the node. | Minimal performance impact. | Less accurate. Unless there’s a consistent-hashing load balancer in front of Kong Gateway, it diverges when scaling the number of nodes. |
cluster
|
Counters are stored in the Kong Gateway data store and shared across nodes. | Accurate1, no extra components to support. |
Each request forces a read and a write on the data store. Therefore, relatively, the biggest performance impact. Not supported in hybrid mode or Konnect deployments. |
redis
|
Counters are stored on a Redis server and shared across nodes. |
Accurate1, less performance impact than a cluster policy.
|
Needs a Redis installation. Bigger performance impact than a local policy.
|
[1]: Only when
config.sync_rateoption is set to0(synchronous behavior).
Two common use cases for rate limiting are:
- Every transaction counts: The highest level of accuracy is needed. An example is a transaction with financial consequences.
- Backend protection: Accuracy is not as relevant. The requirement is only to protect backend services from overloading that’s caused either by specific users or by attacks.
Every transaction counts
In this scenario, because accuracy is important, the local policy is not an option.
Consider the support effort you might need for Redis, and then choose either cluster or redis.
You could start with the cluster policy, and move to redis if performance reduces drastically.
If using a very high sync frequency, use redis. Very high sync frequencies with cluster mode are not scalable and not recommended.
The sync frequency becomes higher when the sync_rate setting is a lower number - for example, a sync_rate of 0.1 is a much higher sync frequency (10 counter syncs per second) than a sync_rate of 1 (1 counter sync per second).
You can calculate what is considered a very high sync rate in your environment based on your topology, number of plugins, their sync rates, and tolerance for loose rate limits.
Together, the interaction between sync rate and window size affects how accurately the plugin can determine cluster-wide traffic. For example, the following table represents the worst-case scenario where a full sync interval’s worth of data hasn’t yet propagated across nodes:
|
Property |
Formula or config location |
Value |
|---|---|---|
| Window size in seconds |
Value set in config.window_size
|
5 |
| Limit (in window) |
Value set in config.limit
|
1000 |
| Sync rate (interval) |
Value set in config.sync_rate
|
0.5 |
| Number of nodes (>1) | – | 10 |
| Estimated load balanced requests-per-second (RPS) to a node | Limit / Window size / Number of nodes | 1000 / 5 / 10 = 20 |
| Max potential lag in cluster count for a given node/s | Estimated load balanced RPS * Sync rate | 20 * 0.5 = 10 |
| Cluster-wide max potential overage/s | Max potential lag * Number of nodes | 10 * 10 = 100 |
| Cluster-wide max potential overage/s as a percentage | Cluster-wide max potential overage / Limit | 100 / 1000 = 10% |
| Effective worst case cluster-wide requests allowed at window size | Limit * Cluster-wide max potential overage | 1000 + 100 = 1100 |
If you choose to switch strategies, note that you can’t port the existing usage metrics from the Kong Gateway data store to Redis. This might not be a problem with short-lived metrics (for example, seconds or minutes) but if you use metrics with a longer time frame (for example, months), plan your switch carefully.
Backend protection
If accuracy is less important, choose the local policy.
You might need to experiment a little before you get a setting that works for your scenario.
As the cluster scales to more nodes, more user requests are handled.
When the cluster scales down, the probability of false negatives increases.
Make sure to adjust your rate limits when scaling.
For example, if a user can make 100 requests every second, and you have an equally balanced 5-node Kong Gateway cluster, you can set the local limit to 30 requests every second.
If you see too many false negatives, increase the limit.
To minimize inaccuracies, consider using a consistent-hashing load balancer in front of Kong Gateway. The load balancer ensures that a user is always directed to the same Kong Gateway node, which reduces inaccuracies and prevents scaling problems.
Using cloud authentication with Redis v3.13+
If your plugin uses a Redis datastore, you can authenticate to it with a cloud Redis provider. This allows you to seamlessly rotate credentials without relying on static passwords.
The following providers are supported:
- AWS ElastiCache
- Azure Managed Redis
- Google Cloud Memorystore (with or without Valkey)
Each provider also supports an instance and cluster configuration.
You need:
- A running Redis instance on an AWS ElastiCache instance for Valkey 7.2 or later or ElastiCache for Redis OSS version 7.0 or later
- The ElastiCache user needs to set “Authentication mode” to “IAM”
- The following policy assigned to the IAM user/IAM role that is used to connect to the ElastiCache:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticache:Connect" ], "Resource": [ "arn:aws:elasticache:ARN_OF_THE_ELASTICACHE", "arn:aws:elasticache:ARN_OF_THE_ELASTICACHE_USER" ] } ] }Copied!
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
host: $INSTANCE_ADDRESS
username: $INSTANCE_USERNAME
port: 6379
cloud_authentication:
auth_provider: aws
aws_cache_name: $AWS_CACHE_NAME
aws_is_serverless: false
aws_region: $AWS_REGION
aws_access_key_id: $AWS_ACCESS_KEY_ID
aws_secret_access_key: $AWS_ACCESS_SECRET_KEYReplace the following with your actual values:
-
$INSTANCE_ADDRESS: The ElastiCache instance address. -
$INSTANCE_USERNAME: The ElastiCache username with IAM Auth mode configured. -
$AWS_CACHE_NAME: Name of your AWS ElastiCache instance. -
$AWS_REGION: Your AWS ElastiCache instance region. -
$AWS_ACCESS_KEY_ID: (Optional) Your AWS access key ID. -
$AWS_ACCESS_SECRET_KEY: (Optional) Your AWS secret access key.
You need:
- A running Redis instance on an AWS ElastiCache cluster for Valkey 7.2 or later or ElastiCache for Redis OSS version 7.0 or later
- The ElastiCache user needs to set “Authentication mode” to “IAM”
- The following policy assigned to the IAM user/IAM role that is used to connect to the ElastiCache:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticache:Connect" ], "Resource": [ "arn:aws:elasticache:ARN_OF_THE_ELASTICACHE", "arn:aws:elasticache:ARN_OF_THE_ELASTICACHE_USER" ] } ] }Copied!
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
cluster_nodes:
- ip: $CLUSTER_ADDRESS
port: 6379
username: $CLUSTER_USERNAME
port: 6379
cloud_authentication:
auth_provider: aws
aws_cache_name: $AWS_CACHE_NAME
aws_is_serverless: false
aws_region: $AWS_REGION
aws_access_key_id: $AWS_ACCESS_KEY_ID
aws_secret_access_key: $AWS_ACCESS_SECRET_KEYReplace the following with your actual values:
-
$CLUSTER_ADDRESS: The ElastiCache cluster address. -
$CLUSTER_USERNAME: The ElastiCache username with IAM Auth mode configured. -
$AWS_CACHE_NAME: Name of your AWS ElastiCache cluster. -
$AWS_REGION: Your AWS ElastiCache cluster region. -
$AWS_ACCESS_KEY_ID: (Optional) Your AWS access key ID. -
$AWS_ACCESS_SECRET_KEY: (Optional) Your AWS secret access key.
You need:
- A running Redis instance on an Azure Managed Redis instance with Entra authentication configured
- Add the user/service principal/identity to the “Microsoft Entra Authentication Redis user” list for the Azure Managed Redis instance
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
host: $INSTANCE_ADDRESS
username: $INSTANCE_USERNAME
port: 10000
cloud_authentication:
auth_provider: azure
azure_client_id: $AZURE_CLIENT_ID
azure_client_secret: $AZURE_CLIENT_SECRET
azure_tenant_id: $AZURE_TENANT_IDReplace the following with your actual values:
-
$INSTANCE_ADDRESS: The Azure Managed Redis instance address. -
$INSTANCE_USERNAME: The object (principal) ID of the Principal/Identity with essential access. -
$AZURE_CLIENT_ID: The client ID of the Principal/Identity. -
$AZURE_CLIENT_SECRET: (Optional) The client secret of the Principal/Identity. -
$AZURE_TENANT_ID: (Optional) The tenant ID of the Principal/Identity.
You need:
- A running Redis instance on an Azure Managed Redis cluster with Entra authentication configured
- Add the user/service principal/identity to the “Microsoft Entra Authentication Redis user” list for the Azure Managed Redis instance
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
cluster_nodes:
- ip: $CLUSTER_ADDRESS
port: 10000
username: $CLUSTER_USERNAME
port: 10000
cloud_authentication:
auth_provider: azure
azure_client_id: $AZURE_CLIENT_ID
azure_client_secret: $AZURE_CLIENT_SECRET
azure_tenant_id: $AZURE_TENANT_IDReplace the following with your actual values:
-
$CLUSTER_ADDRESS: The Azure Managed Redis cluster address. -
$CLUSTER_USERNAME: The object (principal) ID of the Principal/Identity with essential access. -
$AZURE_CLIENT_ID: The client ID of the Principal/Identity. -
$AZURE_CLIENT_SECRET: (Optional) The client secret of the Principal/Identity. -
$AZURE_TENANT_ID: (Optional) The tenant ID of the Principal/Identity.
You need:
- A running Redis instance on an Google Cloud Memorystore instance
- Assign the principal to the corresponding role:
-
Cloud Memorystore Redis DB Connection User(
roles/redis.dbConnectionUser) for Memorystore for Redis Cluster -
Memorystore DB Connector User (
roles/memorystore.dbConnectionUser) for Memorystore for Valkey
-
Cloud Memorystore Redis DB Connection User(
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
host: $INSTANCE_ADDRESS
port: 6379
cloud_authentication:
auth_provider: gcp
gcp_service_account_json: $GCP_SERVICE_ACCOUNTReplace the following with your actual values:
-
$INSTANCE_ADDRESS: The Memorystore instance address. -
$GCP_SERVICE_ACCOUNT: (Optional) The GCP service account JSON.
You need:
- A running Redis instance on an Google Cloud Memorystore cluster
- Assign the principal to the corresponding role:
-
Cloud Memorystore Redis DB Connection User(
roles/redis.dbConnectionUser) for Memorystore for Redis Cluster -
Memorystore DB Connector User (
roles/memorystore.dbConnectionUser) for Memorystore for Valkey
-
Cloud Memorystore Redis DB Connection User(
To configure cloud authentication with Redis, add the following parameters to your plugin configuration:
config:
strategy: redis
redis:
cluster_nodes:
- ip: $CLUSTER_ADDRESS
port: 6379
port: 6379
cloud_authentication:
auth_provider: gcp
gcp_service_account_json: $GCP_SERVICE_ACCOUNTReplace the following with your actual values:
-
$CLUSTER_ADDRESS: The Memorystore cluster address. -
$GCP_SERVICE_ACCOUNT: The GCP service account JSON.
Fallback from Redis
When the redis strategy is used and a Kong Gateway node is disconnected from Redis, the plugin will fall back to local rate limiting.
This can happen when the Redis server is down or the connection to Redis is broken.
Kong Gateway keeps the local counters for rate limiting and syncs with Redis once the connection is re-established.
Kong Gateway will still rate limit, but the Kong Gateway nodes can’t sync the counters. As a result, users will be able
to perform more requests than the limit, but there will still be a limit per node.
Limit by IP address
If limiting by IP address, it’s important to understand how Kong Gateway determines the IP address of an incoming request.
The IP address is extracted from the request headers sent to Kong Gateway by downstream clients. Typically, these headers are named X-Real-IP or X-Forwarded-For.
By default, Kong Gateway uses the header name X-Real-IP to identify the client’s IP address. If your environment requires a different header, you can specify this by setting the real_ip_header Nginx property. Depending on your network setup, you may also need to configure the trusted_ips Nginx property to include the load balancer IP address. This ensures that Kong Gateway correctly interprets the client’s IP address, even when the request passes through multiple network layers.
Headers sent to the client
When this plugin is enabled, Kong Gateway sends some additional headers back to the client, indicating the state of the rate limiting policies in place:
|
Header |
Description |
|---|---|
| RateLimit-Limit | Allowed limit in the timeframe. |
| RateLimit-Remaining | Number of available requests remaining. |
| RateLimit-Reset | The time remaining, in seconds, until the rate limit quota is reset. |
| X-RateLimit-Limit-Second | The time limit, in number of seconds. |
| X-RateLimit-Limit-Minute | The time limit, in number of minutes. |
| X-RateLimit-Limit-Day | The time limit, in number of days. |
| X-RateLimit-Limit-Month | The time limit, in number of months. |
| X-RateLimit-Limit-Year | The time limit, in number of years. |
| X-RateLimit-Remaining-Second | The number of seconds still left in the time frame. |
| X-RateLimit-Remaining-Minute | The number of minutes still left in the time frame. |
| X-RateLimit-Remaining-Day | The number of days still left in the time frame. |
| X-RateLimit-Remaining-Month | The number of months still left in the time frame. |
| X-RateLimit-Remaining-Year | The number of years still left in the time frame. |
| Retry-After |
This header appears on 429 errors, indicating how long the upstream service is expected to be unavailable to the client.
When using window_type: sliding and RateLimit-Reset, Retry-After may increase due to the rate calculation for the sliding window.
|
You can optionally hide the limit and remaining headers with the config.hide_client_headers option.
If more than one limit is set, the plugin returns multiple time limit headers. For example:
X-RateLimit-Limit-Second: 5
X-RateLimit-Remaining-Second: 4
X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9If any of the limits are reached, the plugin returns an HTTP/1.1 429 status
code to the client with the following JSON body:
{ "message": "API rate limit exceeded" }The headers
RateLimit-Limit,RateLimit-Remaining, andRateLimit-Resetare based on the Internet-Draft RateLimit Header Fields for HTTP and may change in the future to respect specification updates.
Rate limiting for Consumer Groups
You can use the Consumer Groups entity to manage custom rate limiting configurations for subsets of Consumers.
You can see an example of this in the guide on enforcing rate limiting tiers with the Rate Limiting Advanced plugin.
Throttle rate limits v3.12+
In Kong Gateway 3.12 or later, you can enable request throttling using the Rate Limiting Advanced plugin to improve clients’ experience and protect upstream origin servers from being overwhelmed by traffic spikes. With throttling, requests that exceed the rate limit threshold can be delayed and retried, rather than immediately rejected with a 429 status code.
We recommend setting disable_penalty to true when using throttle rate limits with sliding window. Because for the sliding window type, if you set disable_penalty to false, all requests, including denied ones, will still be counted toward the rate limit. This can lead to a situation where every subsequent window immediately reaches the limit, causing all requests to be denied. In this case, the throttling mechanism will not take effect, because there are no accepted requests left to throttle.
Throttled rate limits work like the following:
- When a request hits the rate limit, it’s placed into a “waiting room” or queue. The client’s connection is held during this delay.
- This queue uses local, Redis, or cluster strategies to manage the queue of throttled requests using a counter-based approach.
- Requests in the queue are automatically retried after a configurable interval (
config.throttling.interval).- There’s a limit to retries for individual requests (
config.throttling.retry_times), and a cap to the total number of requests waiting (config.throttling.queue_limit). - All concurrent requests will retry at approximately the same time once the specified interval has elapsed.
- There’s a limit to retries for individual requests (
- If a request exceeds its maximum retries or if the waiting room is full, it will ultimately be rejected with a 429 response.
For an example plugin configuration, see Throttle requests.
FAQs
What are the potential impacts and risks associated with enabling request throttling in Rate Limiting Advanced?
Enabling request throttling can lead to a degradation in the capacity of Kong Gateway data plane nodes. This is because client requests are held open for a longer duration during the throttling period compared to normal rejections. This extended occupation of resources (like memory and file descriptors) can reduce the data plane’s ability to handle other new requests, potentially leading to scale or stress issues during high traffic spikes. Configuring a large config.throttling.queue_limit can also consume significant memory on data plane nodes.
What happens to queued requests if a client drops its connection with Kong Gateway during the Rate Limiting Advanced throttling period?
If a client drops its connection with Kong while a request is being throttled (v3.12+), Kong Gateway automatically releases all associated resources for that specific request. This means the individual request will no longer be processed or retried. However, the counter that accounted for this request’s slot in the “waiting room” is automatically managed by the underlying counter mechanism (shared dictionary or Redis). These counters are typically recorded within specific time windows and are automatically evicted when their window expires, ensuring resource cleanup without manual intervention for each dropped connection.
How is memory usage impacted when I enable throttling with the Rate Limiting Advanced plugin?
In regular conditions, memory usage is minimally impacted. In extreme conditions where both ’s header buffer and the kernel’s TCP buffer are fully used and you’re using the default configuration ( accepts a maximum request header size of 32 KB, and the Linux kernel TCP buffer is approximately 200 KB), the average memory consumption of each open connection is around 220 KB for one Route with one Rate Limiting Advanced plugin configured with the following:
-
config.limit: 30 seconds -
config.throttling.interval: 3,600 seconds -
config.throttling.retry_times: 3 -
config.throttling.queue_limit: 100000
You can test your own throttling memory usage under extreme conditions by using a script like the following:
#prepare header strings
H1=$(head -c 8092 < /dev/zero | tr '\0' 'A')
H2=$(head -c 8092 < /dev/zero | tr '\0' 'B')
H3=$(head -c 8092 < /dev/zero | tr '\0' 'C')
H4=$(head -c 8092 < /dev/zero | tr '\0' 'D')
head -c 1000000 /dev/zero > /tmp/1mb
for i in {1..10000};
do
curl -s http://hostname:7000/ \
-H "X-Header-1: $H1" \
-H "X-Header-2: $H2" \
-H "X-Header-3: $H3" \
-H "X-Header-4: $H4" \
--data-binary @/tmp/1mb \
-o /dev/null &
echo "creating $i"
done
wait