Web caching strategies in 2026 span five distinct layers — the browser, the CDN, the reverse proxy, the application cache, and the database query cache — and each one has a different invalidation model and a different way to fail. Treating caching as a single knob is the root cause of most stale-content bugs and most cascading outages. This reference maps all five layers in one place, from HTTP directives governed by RFC 9111 to Next.js 16’s explicit use cache lifetimes.
Most caching articles cover only one or two layers — usually HTTP headers or Redis, rarely both, and almost never the failure modes that connect them. That gap is why teams ship a clever Cache-Control policy and still get paged at 2 a.m. when a popular key expires and a thundering herd flattens the database. A mental model that maps where each piece of data should live, how it expires, and what breaks when it does is worth more than any single tuning trick.
Below: the canonical no-cache versus no-store misunderstanding, a decision matrix for every significant Cache-Control directive, why stale-while-revalidate is the most underused directive in HTTP, how CDN tag-based purging actually propagates, Redis eviction policies and cache-stampede mitigation, and how Next.js 16’s use cache plus cacheLife replace ISR-style fetching with composable lifetimes. It closes with a proprietary five-layer reference table you can keep next to your architecture diagram.
- 01Caching is five layers, not one setting.Browser, CDN, reverse proxy, application (Redis/Memcached), and database query cache each have distinct invalidation semantics and failure modes. A response can be cached — or stale — at every one of them simultaneously.
- 02no-cache does not mean do not cache.no-cache permits storage but forces revalidation with the origin before reuse. The directive that actually prevents storage is no-store. This is the single most common caching misunderstanding among engineers.
- 03stale-while-revalidate hides latency at zero freshness cost.Specified in RFC 5861 and supported by Cloudflare, CloudFront, and Fastly, it lets a shared cache serve a stale response while it revalidates in the background — so users never wait on the origin round-trip.
- 04Cache stampede is the canonical application-layer failure.When a hot key expires, concurrent requests all hit the database at once. Mitigate with distributed locking (Redis SETNX), request coalescing (singleflight), or probabilistic early expiration — ideally layered together.
- 05Next.js 16 makes cache lifetimes explicit.The use cache directive with cacheLife profiles replaces ISR-style data fetching. It requires cacheComponents: true, and a stale below 30 seconds or expire under 5 minutes turns a cache into a request-time dynamic hole.
01 — The StackCaching is five layers, each with its own failure mode.
A request for a single resource can pass through — and be cached at — five independent layers before it ever reaches your business logic. Each layer caches for a different reason, expires on a different clock, and fails in a different way. Understanding the stack as a whole is what lets you answer the only question that matters when you’re deciding where to put a piece of data: which layer should own its freshness?
Browser / HTTP cache
Lives in the user's browser. Governed by Cache-Control and validators. Its sharpest edge: with no header set, caches may heuristically store a response for roughly 10% of its last-modified age.
CDN / edge cache
A shared cache near the user. Invalidated by tag/surrogate-key purges or path purges. Fails through Vary fragmentation and uneven purge support across pricing plans.
Reverse proxy
An origin-adjacent shared cache. Often the same tech as the CDN. Soft-purge marks entries stale rather than deleting them, so stale-if-error fallbacks survive an outage.
Application cache
An in-memory key-value store. Invalidated by key delete, TTL, or eviction policy. Its signature failure is the cache stampede when a popular key expires under concurrent load.
Database query cache
Caches full query results, keyed by a hash of the query and its parameters. Any write to a referenced table invalidates the cached result — heavy invalidation overhead in write-heavy workloads.
immutable TTL. Per-tenant API responses belong to Redis with explicit invalidation. A live dashboard count belongs nowhere durable — serve it dynamically. Most stale-content bugs trace back to two layers both thinking they own the same value.02 — The Canonical Trapno-cache does not mean “do not cache.”
This is the single most common caching misunderstanding, and it is worth correcting before anything else. The directive no-cache does not prevent a response from being stored. It means the cache may store the response, but it must revalidate with the origin before reusing it — typically via a conditional request that returns 304 Not Modified when the content is unchanged. The directive that actually prevents storage entirely is no-store.
The practical consequence: if you want a resource to never be cached at all — a response with sensitive data, say — no-cache is the wrong choice. It permits storage and merely demands a revalidation round-trip. Reach for no-store instead. Conversely, if you want a resource that is always served fresh but still benefit from a 304 when nothing changed, no-cache is exactly right — and far more bandwidth- efficient than disabling caching outright.
no-store = never write it down. no-cache = write it down, but check with me before reusing it. Per MDN’s HTTP caching guide, Cache-Control directives are case-insensitive (lowercase is recommended) and comma-separated, so a single header can combine several at once.One more directive deserves a mention here because engineers conflate it with both: must-revalidate tells a cache it may serve a stored response while it is still fresh, but once that response goes stale it must revalidate with the origin and may not serve the stale copy on its own initiative. That is a meaningfully different contract from no-cache, which revalidates on every use, fresh or not.
03 — Decision MatrixEvery Cache-Control directive, and the mistake to avoid with each.
MDN documents these directives narratively. The matrix below is the compact decision form — for each significant directive, whether it applies to private caches, shared caches, or both; whether the response may be stored; what it is best for; and the specific mistake engineers most often make with it. RFC numbers are included so you can trace any cell back to the spec.
| Directive | Applies to | Stores? | Best for | Mistake to avoid | RFC |
|---|---|---|---|---|---|
| max-age=N | Both | Yes | Versioned or slow-changing assets | Measured from origin generation, not cache receipt | 9111 |
| s-maxage=N | Shared | Yes | Overriding max-age at the CDN only | Ignored by private/browser caches | 9111 |
| no-cache | Both | Yes | Always-fresh resources you still want cached | Does NOT prevent storage — it forces revalidation | 9111 |
| no-store | Both | No | Sensitive or never-cacheable responses | Confusing it with no-cache | 9111 |
| must-revalidate | Both | Yes | Resources that must never serve stale once expired | Pairing it with stale-while-revalidate by accident | 9111 |
| stale-while-revalidate=N | Both | Yes | Hiding revalidation latency from users | Underusing it — the most overlooked directive | 5861 |
| stale-if-error=N | Both | Yes | Serving last-good content during origin 5xx | Forgetting it leaves no fallback on outage | 5861 |
| immutable | Both | Yes | Hashed static assets that never change | Using it on URLs without content hashes | 8246 |
| private | Private | Yes | Per-user responses (browser cache only) | Letting a shared CDN cache personalised data | 9111 |
| public | Shared | Yes | Explicitly opting responses into shared caches | Marking authenticated responses public | 9111 |
Two cells warrant emphasis. First, max-age measures elapsed time since the response was generated on the origin server — not since it was received by an intermediate cache. The Age header that intermediate caches add deducts that transit time, so a response with max-age=600 that spent 120 seconds reaching a CDN is already 2 minutes into its life on arrival. Second, immutable (RFC 8246) is only safe on URLs that carry a content hash, which is exactly why Next.js applies public, max-age=31536000, immutable to everything under /_next/static/ — the hash in the filename is the cache buster.
04 — The Underused Directivestale-while-revalidate hides latency at zero freshness cost.
stale-while-revalidate (RFC 5861) is the most underused directive in the HTTP caching toolkit. It lets a cache serve a stale response for a defined window while it revalidates the resource in the background — so the user gets an instant response and never waits on the origin round-trip. The revalidated content is ready for the next request. A combined example: Cache-Control: max-age=2592000, stale-while-revalidate=86400 caches for 30 days with a one-day grace window during which stale content is served while a fresh copy is fetched.
Caches may serve the response...after it becomes stale, up to the indicated number of seconds.MDN Web Docs — stale-while-revalidate, Cache-Control reference
Its sibling, stale-if-error (also RFC 5861), is your origin-outage safety net. It permits a cache to serve a stale response for a defined window when the origin returns a 5xx status — 500, 502, 503, or 504 — or is unreachable entirely. Stacked together, the three directives form a resilient policy: Cache-Control: max-age=3600, stale-while-revalidate=600, stale-if-error=86400 gives one hour of freshness, a ten-minute window to hide revalidation latency, and a full day of last-good fallback if the origin falls over.
A persistent myth is that stale-while-revalidate is a browser-only feature. It is not — it is specified for shared caches and is supported at the edge by Cloudflare, Amazon CloudFront, and Fastly. That makes it most valuable precisely where it is least used: at the CDN, where a single background revalidation can shield the origin from a traffic spike while every user still gets a sub-second response.
max-age and a generous stale-while-revalidate window is the closest HTTP gets to ISR-style behaviour without any application code. The cache absorbs the traffic, the origin revalidates lazily, and freshness cost stays at zero because users never block on the round-trip.05 — ValidatorsETags, conditional requests, and the 304 that saves bandwidth.
When a cached response goes stale, revalidation does not have to re-download the whole body. Validators let a cache ask the origin “has this changed?” and receive a tiny 304 Not Modified with no body when it hasn’t. There are two: ETag and Last-Modified.
Strong ETags guarantee byte-for-byte identity; weak ETags — prefixed W/ — guarantee only semantic equivalence. RFC 9110 recommends sending both an ETag and a Last-Modified header in responses, and during revalidation If-None-Match takes precedence over If-Modified-Since. A matching ETag returns 304 Not Modified with no response body, saving the bandwidth of re-sending an unchanged resource.
When no header is set
If a response carries Last-Modified but no Cache-Control, caches may heuristically store it for roughly 10% of the time since last modification, per the RFC 9111 recommendation. Always set an explicit header.
public, max-age=31536000
RFC 9204 ships pre-defined max-age values: index 37 is one week, 38 is one month, and 41 is public, max-age=31536000. HTTP/2 and HTTP/3 implementations use these compressed forms widely.
Per request-header variation
Vary stores a separate response per unique value of a header. Vary: Accept-Language is fine; Vary: User-Agent should be avoided because the variation count explodes and shreds cache hit-rates.
06 — Edge PurgingCDN invalidation: tags, surrogate keys, and propagation time.
At the edge, the question shifts from “when does this expire?” to “how do I purge it the instant it changes?” The three major CDNs answer it differently, and the differences matter for both speed and cost. A well-architected web development stack layers caching across the browser, CDN, and application tiers — and wires purge into the same workflow that publishes the content.
CDN invalidation mechanisms · propagation & capability
Sources: Fastly purge docs; CDN invalidation guide; Cloudflare changelog (2026-03-24)Fastly leads on tag-based purging. Its Surrogate-Key mechanism lets you tag any response and purge every object carrying that tag in roughly 150ms globally. Individual keys are limited to 1,024 bytes and the full Surrogate-Key header may not exceed 16,384 bytes; purges run via dashboard, API, CLI, and a Rust edge SDK, and soft purges mark content stale rather than deleting it outright.
Cloudflare exposes a dedicated CDN-Cache-Control header that controls CDN behaviour without affecting upstream or downstream caches, accepting the same directives as Cache-Control. Its Origin Cache Control is enabled by default on Free, Pro, and Business plans, and the Cache Response Rules that shipped on March 24, 2026 let operators rewrite Cache-Control directives, manage cache tags, and strip headers like Set-Cookie from origin responses before they reach the cache — all without touching origin config.
Amazon CloudFront handles invalidation by path, with reported propagation in the 10-to-60-second range. It also supports stale-while-revalidate and stale-if-error directives at the edge.
07 — Application LayerRedis eviction policies and the thundering herd.
The application cache is where most teams spend their tuning time, and two decisions dominate: which eviction policy runs when memory fills, and how you prevent a cache stampede when a hot key expires.
Eviction policies
Redis offers a full set of eviction policies — noeviction, the allkeys-* family (lru, lfu, random), and the volatile-* family that only evicts keys carrying a TTL. For a pure cache, allkeys-lru or allkeys-lfu are recommended; for a mixed workload where Redis also holds non-cache data, use volatile-lru with TTLs set on the cached keys only. LFU support arrived in Redis 4.0. Eviction only triggers once the instance reaches maxmemory — below that ceiling, keys live until their TTL.
Two tuning knobs are worth knowing. LRU is approximate: Redis samples a handful of keys rather than scanning them all, and the default maxmemory-samples 5 can be raised to 10 for a closer approximation of true LRU at a marginal CPU cost. For LFU, the defaults lfu-log-factor 10 and lfu-decay-time 1 (minutes) control how quickly the frequency counter saturates and how fast access counts decay.
Cache stampede (thundering herd)
When a popular cache key expires, many concurrent requests can simultaneously query the database to regenerate it — potentially causing a cascading failure. There are three primary mitigations, and they compose well:
Distributed locking
A Redis lock via SETNX with an expiry ensures only one process across all instances regenerates the value; the rest wait briefly or serve stale. Simple and effective, but the lock holder becomes a single point of latency.
Request coalescing
Singleflight collapses N concurrent identical requests into one origin call and fans the single result back out. Go's golang.org/x/sync/singleflight is the canonical implementation; equivalents exist in most ecosystems.
Probabilistic early expiration
Refresh a key probabilistically before its TTL actually hits, so regeneration is spread across time rather than synchronised on a single expiry instant. No coordination required, and it removes the expiry cliff entirely.
stale-while-revalidate at the app tier
Serving the stale value while one process regenerates — the application-tier analogue of the HTTP directive — combines naturally with locking or singleflight to keep latency flat during regeneration.
A Redis-based lock ensures only one request across all instances fetches from the database.SWE Helper — cache stampede prevention
One layer deeper sits the database query cache, where Redis can cut response times substantially for read-heavy workloads by keying results on a hash of the full query and its parameters. The catch is invalidation: any write to any table referenced in a cached query invalidates the entire cached result, which means query caching pays off most in read-heavy systems and least in write-heavy ones. This is the natural complement to reducing database query load through indexing — caching removes the query, indexing makes the unavoidable ones fast.
08 — Framework CachingNext.js 16 use cache and explicit cacheLife profiles.
Next.js 16 (caching docs at version 16.2.9) replaces ISR-style data fetching with an explicit, composable model. The use cache directive caches the return value of an async function or component, and it is enabled by setting cacheComponents: true in next.config.ts. You can apply it at the data level — an individual fetching function — or at the UI level, on a full component or page. Arguments and closed-over values automatically become part of the cache key, so two calls with different inputs cache separately without any manual key management. This is the framework context for migrating to Next.js 16 Cache Components.
It is recommended to specify an explicit cacheLife. With explicit lifetime values, you can inspect a cached function or component and immediately know its behavior without tracing through nested caches.Next.js documentation — cacheLife API reference
The companion cacheLife function controls how long a cached value lives, via three properties: stale (how long the client router cache may serve without checking the server), revalidate (how often the server refreshes in the background), and expire (the hard maximum age before the cache is treated as dynamic). Next.js ships seven built-in profiles — the table below maps each to its use case and its stale/revalidate/ expire triple.
| Profile | Use case | stale | revalidate | expire |
|---|---|---|---|---|
| default | Standard content | 5 minutes | 15 minutes | never |
| seconds | Real-time data | 30 sec | 1 second | 1 minute |
| minutes | Frequently updated | 5 minutes | 1 minute | 1 hour |
| hours | Multiple daily updates | 5 minutes | 1 hour | 1 day |
| days | Daily updates | 5 minutes | 1 day | 1 week |
| weeks | Weekly updates | 5 minutes | 1 week | 30 days |
| max | Rarely changes | 5 minutes | 30 days | 1 year |
Three sharp edges are worth internalising. First, the stale minimum is 30 seconds, enforced by Next.js — and stale controls the client-side router cache via the x-nextjs-stale-time response header, not Cache-Control directly. Second, calling revalidateTag(), revalidatePath(), updateTag(), or refresh() from a Server Action immediately clears the entire client cache, bypassing the stale time. Third — and most likely to trip you up — a cache with revalidate=0 or expire under 5 minutes is automatically excluded from prerenders and becomes a request-time “dynamic hole.” That includes the built-in seconds profile, and a short-lived cache nested inside a longer use cache without an explicit cacheLife will throw a prerender error.
This connects to Partial Prerendering, the default rendering mode when Cache Components is enabled. Static content and use cache content become the static HTML shell, while <Suspense>-wrapped dynamic content streams in at request time. Looking forward, the Next.js team is moving toward pathname-based CDN cache keying — where a full-page RSC response is served from /my/page.rsc and segment RSC from a .segment.rsc path — so CDNs need no Vary support and no custom header parsing. It is an announced design direction worth designing toward, not yet a finished default.
09 — The ReferenceThe five-layer cache reference, in one table.
Here is the table to keep next to your architecture diagram. For each of the five layers it names the invalidation mechanism, the characteristic failure mode, the recommended directives or patterns, the Next.js integration point, and a typical TTL range. No single existing reference maps all five layers with their failure modes together — this is the one to bookmark.
| Layer | Invalidation | Failure mode | Directives / patterns | Next.js point | Typical TTL |
|---|---|---|---|---|---|
| Client & edge tiers | |||||
| Browser / HTTP cache | Content-hash filenames; ETag 304 revalidation | Heuristic caching (~10% of last-modified age) when no header is set | max-age, immutable, no-cache | /_next/static at max-age=31536000, immutable | Seconds to 1 year |
| CDN / edge cache | Tag/surrogate-key purge; path or prefix purge | Vary fragmentation; cross-plan purge gaps | s-maxage, CDN-Cache-Control, stale-while-revalidate | s-maxage on static + ISR responses | Minutes to 1 year |
| Reverse proxy | Explicit purge API; soft-purge to stale | Stale config drift between origin and proxy | s-maxage, surrogate keys, stale-if-error | Sits in front of the Node/runtime server | Seconds to hours |
| Origin & data tiers | |||||
| Application cache (Redis / Memcached) | Key delete; TTL expiry; eviction policy | Cache stampede (thundering herd) on key expiry | allkeys-lru / allkeys-lfu; SETNX locking | Backing store for use cache / cacheLife | Seconds to days |
| Database query cache | Invalidate on any write to a referenced table | Heavy invalidation overhead in write-heavy loads | Hash-of-query keys; cache-aside / read-through | Wrapped behind a use cache data function | Seconds to minutes |
Read the table as a routing guide. Push immutable, hashed assets to the browser and forget about them for a year. Put cacheable, shared-but-purgeable content at the CDN and wire surrogate-key purges into your publish workflow. Reserve the application cache for the expensive, per-tenant computations that benefit most from being memoised — and protect those keys against stampede. Use the database query cache sparingly, and only where reads dominate writes, because its invalidation overhead grows with write volume. Each layer is a tool; the architecture is choosing which one owns each value. The same care that goes into rate-limiting strategies for your API layer and into idempotency in distributed systems applies here: the perimeter behaviour is only as reliable as the invalidation discipline behind it.
10 — ConclusionCaching is a layered discipline, not a single switch.
The hard part of caching was never storing — it's invalidating.
Every layer in the stack makes storing a value trivial. What separates a fast, correct system from a flaky one is invalidation: knowing which layer owns each value’s freshness, how it expires, and what happens when it does. Get that mapping right and the five layers reinforce each other; get it wrong and a single stale entry, or a single expired hot key, takes the whole request path down with it.
The throughline of 2026 is that the best primitives are converging on the same idea — serve something instantly, refresh lazily. RFC 5861’s stale-while-revalidate does it at the HTTP layer, singleflight and probabilistic expiration do it in Redis, and Next.js 16’s use cache with explicit cacheLife profiles does it in the framework. Designing for that pattern, rather than against it, is what keeps a system fast under load without sacrificing correctness.
Start from the canonical correction — no-cache means revalidate, not refuse — set explicit headers everywhere so heuristic caching never surprises you, and pick the layer that should own each value’s freshness deliberately. Keep the five-layer reference table close, verify CDN and framework behaviour against current docs before you depend on it, and measure your own latency rather than trusting a vendor benchmark. Caching rewards discipline far more than cleverness.
