Voozh

If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.

Prerequisite: the built-in rate limiter lives in System.Threading.RateLimiting and the ASP.NET Core middleware in Microsoft.AspNetCore.RateLimiting. Both ship in the box from .NET 7 onwards.

Why rate limiting matters

Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.

The four built-in algorithms

1. Fixed window

Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.

using System.Threading.RateLimiting;

var limiter = new FixedWindowRateLimiter(
 new FixedWindowRateLimiterOptions
 {
 PermitLimit = 100,
 Window = TimeSpan.FromMinutes(1),
 QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
 QueueLimit = 0 // reject immediately when full
 });

2. Sliding window

Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.

var limiter = new SlidingWindowRateLimiter(
 new SlidingWindowRateLimiterOptions
 {
 PermitLimit = 100,
 Window = TimeSpan.FromMinutes(1),
 SegmentsPerWindow = 6, // 10-second granularity
 QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
 QueueLimit = 0
 });

3. Token bucket

A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.

var limiter = new TokenBucketRateLimiter(
 new TokenBucketRateLimiterOptions
 {
 TokenLimit = 50, // max burst
 ReplenishmentPeriod = TimeSpan.FromSeconds(10),
 TokensPerPeriod = 10, // ~1/s average
 AutoReplenishment = true,
 QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
 QueueLimit = 0
 });

4. Concurrency limiter

Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.

var limiter = new ConcurrencyLimiter(
 new ConcurrencyLimiterOptions
 {
 PermitLimit = 20,
 QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
 QueueLimit = 5
 });

Wiring it up in ASP.NET Core

Register policies in Program.cs, then apply them with the [EnableRateLimiting] attribute or inline via RequireRateLimiting().

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
 options.AddFixedWindowLimiter(policyName: "fixed", opt =>
 {
 opt.PermitLimit = 100;
 opt.Window = TimeSpan.FromMinutes(1);
 opt.QueueLimit = 0;
 });

 options.AddTokenBucketLimiter(policyName: "burst", opt =>
 {
 opt.TokenLimit = 50;
 opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
 opt.TokensPerPeriod = 10;
 opt.AutoReplenishment = true;
 });
});

var app = builder.Build();
app.UseRateLimiter(); // must come before MapControllers

Apply to a minimal API endpoint or controller action:

// Minimal API
app.MapGet("/products", GetProducts)
 .RequireRateLimiting("fixed");

// Controller
[EnableRateLimiting("burst")]
[HttpGet("search")]
public IActionResult Search(string query) { ... }

Per-user and per-endpoint policies

A single global policy rarely fits real-world needs. Use AddPolicy with a partition key derived from the request context:

options.AddPolicy("per-user", httpContext =>
 RateLimitPartition.GetTokenBucketLimiter(
 partitionKey: httpContext.User.Identity?.Name
 ?? httpContext.Connection.RemoteIpAddress?.ToString()
 ?? "anonymous",
 factory: _ => new TokenBucketRateLimiterOptions
 {
 TokenLimit = 200,
 ReplenishmentPeriod = TimeSpan.FromMinutes(1),
 TokensPerPeriod = 200,
 AutoReplenishment = true
 }));

Tip: prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.

Custom rejection responses

By default, the middleware returns 503 Service Unavailable. The RFC-correct status for rate limiting is 429 Too Many Requests with a Retry-After header:

options.OnRejected = async (context, token) =>
{
 context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

 if (context.Lease.TryGetMetadata(
 MetadataName.RetryAfter, out var retryAfter))
 {
 context.HttpContext.Response.Headers.Append(
 "Retry-After",
 ((int)retryAfter.TotalSeconds).ToString(
 System.Globalization.CultureInfo.InvariantCulture));
 }

 await context.HttpContext.Response.WriteAsync(
 "Rate limit exceeded. Please slow down.", token);
};

Distributed scenarios & Redis

The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the RedisRateLimiting community library, which wraps the same RateLimiter abstraction:

dotnet add package RedisRateLimiting

builder.Services.AddStackExchangeRedisCache(o =>
 o.Configuration = builder.Configuration["Redis:Connection"]);

options.AddPolicy("distributed", httpContext =>
 RedisRateLimitPartition.GetSlidingWindowRateLimiter(
 partitionKey: httpContext.User.Identity?.Name ?? "anon",
 factory: _ => new RedisSlidingWindowRateLimiterOptions
 {
 ConnectionMultiplexerFactory =
 httpContext.RequestServices
 .GetRequiredService<IConnectionMultiplexer>,
 PermitLimit = 500,
 Window = TimeSpan.FromMinutes(1)
 }));

Client-side resilience with Polly

If your code consumes a rate-limited API, use Polly's RateLimiter strategy combined with Retry to handle 429s gracefully:

dotnet add package Polly.Extensions.Http

services.AddHttpClient<IProductsClient, ProductsClient>()
 .AddResilienceHandler("products-pipeline", builder =>
 {
 builder.AddRateLimiter(new SlidingWindowRateLimiter(
 new SlidingWindowRateLimiterOptions
 {
 PermitLimit = 50,
 Window = TimeSpan.FromSeconds(10),
 SegmentsPerWindow = 5
 }));

 builder.AddRetry(new HttpRetryStrategyOptions
 {
 MaxRetryAttempts = 3,
 Delay = TimeSpan.FromSeconds(2),
 BackoffType = DelayBackoffType.Exponential,
 ShouldHandle = args => ValueTask.FromResult(
 args.Outcome.Result?.StatusCode ==
 HttpStatusCode.TooManyRequests)
 });
 });

Choosing the right algorithm

Algorithm	Best for	Watch out for	Memory cost
Fixed window	Simple quotas, billing tiers	Boundary burst (2× spike)	Very low
Sliding window	Smooth public APIs	Segment count × partitions	Low–medium
Token bucket	Burst-tolerant consumer APIs	Tuning burst vs average	Low
Concurrency	Expensive ops (ML, reports)	Doesn't bound throughput	Very low

Distributed gotcha: in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.

Wrapping up

.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return 429 with a Retry-After header — your API consumers will thank you.

Questions or patterns I missed? Drop them in the comments.

URL: https://dev.to/printo_tom/rate-limiting-in-c-dont-let-your-api-get-hammered-4hjj

⇱ Rate Limiting in C# — Don't Let Your API Get Hammered - DEV Community