InfoQ Homepage News Apple Improves Context Window Management for its Foundation Models
Apple Improves Context Window Management for its Foundation Models
Mar 23, 2026 2 min read
Write for InfoQ
Feed your curiosity. Help 550k+ globalsenior developers
each month stay ahead.Get in touch
iOS 26.4, now in Release Candidate, introduces improved context window management for Apple's Foundation Models, helping developers work with the 4096-token context window limit. This encourages treating the context window as a constrained resource, which requires actively managing it like memory in a low-resource system to optimize its usage.
As with most large language models, the context window is a critical resource used to hold system instructions, user prompts, and model responses. Because Apple's Foundation Models run on-device, they offer a relatively small context window which can fill up quickly, especially in chat-like sessions where user prompts and LLM's responses continuously accumulate.
In such cases, the framework throws an .exceededContextWindowSize error, and the LLM won't be able to respond within the same session. To recover from this error, developers need to start a new session and reinitialize its state so it can effectively carry on with the existing workflow without impairing user experience.
In a previous technical note, Apple outlined practical strategies for developers to proactively deal with the context window limitation, such as splitting large tasks into multiple LLM sessions, asking the model to generate shorter answers, trimming prompts by summarizing them or retaining only the most relevant turns, and using tools efficiently.
To help developers track how the context window is being used, iOS 26.4 introduced a new contextSize property on SystemLanguageModel, which returns the available context capacity, along with a tokenCount(for:) method to measure how many tokens a given input consumes. While the current maximum is 4096 tokens, contextSize removes the need to hardcode that limit and tokenCount(for:) is the foundation for token bookkeeping, allowing apps to adapt dynamically.
Knowing the context window size and being able to calculate token consumption are essential, but they don't solve all the problems for developers, since managing token consumption is not a trivial task. In a practical article, Artem Novichkov demonstrates an effective approach.
Artem points out that you must account for all components contributing to the context, including the system prompt and user instructions, but also how tool usage affects the context window size, which can be surprising:
When you use tools, their definitions (name, description, and argument schema) are serialized and sent alongside your instructions. This increases the token count significantly.
Note that Artem refers to the tokenUsage(for:) method in its article, which appears to have been renamed to tokenCount(for:) in the latest RC release. He also highlights that these new additions to the Foundation Models framework are marked with [@backDeployed](https://www.hackingwithswift.com/swift/5.8/function-back-deployment)(before: iOS 26.4, macOS 26.4, visionOS 26.4), making them available on all iOS versions that support the framework.
About the Author
Sergio De Simone
This content is in the Mobile topic
Related Topics:
-
Related Editorial
-
Related Sponsors
-
Popular across InfoQ
-
ArrowJS Reaches 1.0, Recast as the First UI Framework for the Agentic Era
-
Anthropic Releases and Temporarily Suspends Claude Fable 5
-
Slack Eliminates SSH in EMR Pipelines, Migrates 700+ Jobs to Rest-Based Architecture
-
Anthropic Explains How Claude Builds Its Own Execution Harnesses
-
Spring Boot 4.1 Adds gRPC Auto-Configuration, SSRF Mitigation, and Kotlin 2.3 Support
-
Increasing Users' Data Agency: From BlueSky's AT Protocol to the Local-First Software Movement
-
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
