![]() |
VOOZH | about |
This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
System instructions normally live in the top-level system field, ahead of every message in the conversation. That position is great for prompt caching: the system prompt is part of the stable prefix, so subsequent turns hit the cache. It is a poor position for instructions you only discover you need partway through a session, because editing the top-level system field changes the very beginning of the prompt and invalidates the cache for everything that follows.
Mid-conversation system messages close that gap. You append a {"role": "system"} message at the point in the conversation where the new instruction becomes relevant, instead of editing the top-level system field. The cached prefix stays the same, so the next request still reads it from cache, and the new instruction is still applied as a system instruction rather than as ordinary user text.
Mid-conversation system messages are available on the Claude API and Claude Platform on AWS. They are not available on Amazon Bedrock, Vertex AI, or Microsoft Foundry.
This feature is available on Claude Opus 4.8 only. No beta header is required.
Prompt caching hashes the request prefix in order: tools, then system, then messages. A cache hit requires the prefix to match a recent request exactly, byte for byte, up to the cache breakpoint.
That ordering means the top-level system field sits near the very start of the hashed prefix. Any change to it, even appending a sentence, produces a different hash, and the request misses the cache for the system prompt and every cached message after it.
Mid-conversation system messages let you add the instruction at the end of the message history instead. Everything before the new instruction is unchanged, so the existing cache entry still matches, and only the new message is processed as fresh input.
A few situations where this matters:
system field would re-process the entire history.In all of these cases you could put the instruction in a regular user message, and Claude does follow instructions that arrive in user turns. The difference is priority: a user message is treated as coming from the end user, while a system message is treated as coming from you, the application operator. When the two conflict, system instructions take precedence, so use the system role for operator-level facts and constraints that should hold even if the end user asks for something different. A mid-conversation system message keeps that operator-level priority without paying the cache-miss cost of editing the top-level system field.
Add a message with "role": "system" to the messages array. Use a plain string or content blocks for content, the same as a user or assistant turn. The instruction applies from that point in the conversation onward. When instructions conflict, later system messages take precedence over earlier ones, and mid-conversation system messages take precedence over the top-level system field for the turns that follow them.
You can still set the top-level system field for instructions that should apply to the entire conversation. Reserve mid-conversation system messages for instructions that only become relevant later, or that you want to add without invalidating the cached prefix.
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
# Automatic prompt caching: each request caches the conversation so far,
# and the next request reads the unchanged prefix from cache.
cache_control={"type": "ephemeral"},
system="You are a code review assistant. Be concise.",
messages=[
{
"role": "user",
"content": "Review process() in utils.py for performance issues.",
},
{
"role": "assistant",
"content": "The list comprehension is fine for small inputs. For large inputs, consider a generator to avoid materializing the full list.",
},
{
"role": "user",
"content": "Now review the calling code that invokes process().",
},
# The reviewer realizes mid-session that all suggestions must
# also pass the team's strict typing policy. Appending the
# instruction here keeps earlier turns byte-identical, so the
# prefix cached by the previous request is still read from cache.
{
"role": "system",
"content": "From now on, every suggestion must include explicit type annotations.",
},
],
)
print(response.content[0].text)This example enables automatic caching with the top-level cache_control field. Prompt caching is opt-in: if a request has no cache_control field (automatic or an explicit breakpoint), nothing is cached and every request pays the regular input token price for the full conversation. With caching enabled, appending the system message leaves the already-cached turns unchanged, so the request that carries the new instruction still reads them from cache instead of processing them again. Caching also requires the conversation to meet the minimum cacheable prompt length; an example as short as this one falls below it, so cache_creation_input_tokens and cache_read_input_tokens stay at 0 until the conversation grows.
A mid-conversation system message must immediately follow a user turn (or an assistant turn ending in a server tool use), and must either be the last entry in messages or be immediately followed by an assistant turn. A user message that carries tool_result blocks counts: in an agentic loop you can place the system message right after the tool results, before Claude's next turn. The one position that is not allowed is between an assistant tool_use block and the tool_result that answers it.
In an agentic loop, the system message goes after the user message that delivers the tool results. This is also where your application can relay input that the user typed while Claude was working, so the new context is absorbed without restarting the turn:
[
{ "role": "user", "content": "Run the test suite and fix any failures." },
{
"role": "assistant",
"content": [{ "type": "tool_use", "id": "toolu_01", "name": "run_tests", "input": {} }]
},
{
"role": "user",
"content": [
{ "type": "tool_result", "tool_use_id": "toolu_01", "content": "12 passed, 0 failed" }
]
},
{
"role": "system",
"content": "The user sent the following message while you were working: also update the changelog before you finish."
}
]Phrase the system content as context rather than as a command that overrides the user. State the fact ("new input arrived from the user: X", "the remaining token budget is now Y") and let Claude act on it. Claude is trained to resist instructions that appear to work against the user, and that protection still applies to the system role, so language such as "ignore what the user said" is less effective than stating what changed.
This pattern is for relaying input from the conversation's own end user. Do not use it to pass tool output, retrieved documents, or other third-party content; keep that content in tool_result blocks (see Limitations).
Mid-conversation system messages and prompt caching are designed to be used together:
cache_control, either the top-level automatic caching field or an explicit breakpoint on a content block. A mid-conversation system message does not create a cache entry on its own, and without caching enabled there are no savings to preserve.cache_control on the last block that stays the same across requests, whether that is the end of the top-level system field, the end of your tool definitions, or a stable point in the message history.Avoid editing or removing a mid-conversation system message that has already been sent. Like any other change to earlier messages, that invalidates the cache from that point forward. If the instruction needs to evolve, append a new system message rather than rewriting the old one. Consecutive system messages are not allowed; merge instructions into one message or wait for the next user turn before appending.
system message cannot be the first entry in messages. Use the top-level system field for instructions that apply from the very start.system message must immediately follow a user turn (including a user turn that carries tool_result blocks) or an assistant turn ending in server tool use, and must precede an assistant turn or end the array. It cannot sit between a tool_use block and its tool_result. Placing it elsewhere returns a 400 error.tool_result blocks and continue to follow Mitigate jailbreaks and prompt injections.How caching works, where to place breakpoints, and how to read cache usage fields.
Find out exactly where two requests diverged when a cache hit you expected does not happen.
Message structure, multi-turn conversations, and the system field.
Writing effective prompts and system instructions.
How tool_use and tool_result blocks are structured in the messages array.
Was this page helpful?