VOOZH about

URL: https://thenewstack.io/new-small-ai-model-lets-developers-experiment-on-ios/

⇱ New Small AI Model Lets Developers Experiment on iOS - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-02-19 11:29:05
New Small AI Model Lets Developers Experiment on iOS
AI / AI Engineering / Large Language Models

New Small AI Model Lets Developers Experiment on iOS

Ai2 says it has created a model and app for securely using and developing AI applications on Apple's iOS platform.
Feb 19th, 2025 11:29am by Loraine Lawson
👁 Featued image for: New Small AI Model Lets Developers Experiment on iOS
Image by Philip Oroni via Unsplash.

Developers have a new way to experiment with developing AI apps for iOS devices after the release this week of an small open source model from The Allen Institute for AI (Ai2) and Contextual AI.

The new language model, called OLMoE, works on iPhone 15 Pro or newer (due to hardware restrictions, it won’t work on earlier versions), as well as M-series iPads and Macs. Smaller models will be available on desktops and other versions of Apple phones in the coming weeks, the company added.

The AI can be used to test models locally and integrate the OLMoE model into other iOS applications. Or, you can just play with it.

There are two parts to this release. First, there’s an open source app available under the Apache 2.0 license and available in the Apple App store; it was built by Ai2 and GenUI. Second, there’s the OLMoE language model that allows developers to experiment with AI on an iOS device.

“To build this application, we combined our best fully open recipes,” Ai2 wrote in a post announcing the OLMoE app. “The starting point is OLMoE, our most efficient, fully-open language model.”

A Private AI

OLMoE is a mixture-of-experts model (MoE), which is a machine learning technique that combines multiple specialized sub-models, called “experts,” to solve a complex problem.

Interactions are private, because they never leave the device and each interaction is deleted once a new conversation begins, Ai2 added. The AI doesn’t track data or send it to the cloud unless the user allows data to be sent in for research purposes.

“Developers may choose to use fully local AI models that aren’t connected to the cloud for privacy, device context, and performance,” said Ai2 research scientist Luca Soldaini. “Privacy is a major factor when handling sensitive user data such as text messages, financial information, or other personal content — keeping everything on-device and never needing to be connected to the cloud ensures that this data is safe and secure.”

Creating a Small Model

OLMoE can process text at a speed of over 40 tokens per second. Ai2 explained how it created the new version of allenai/OLMoE-1B-7B-0125-Instruct by using the Dolmino mix introduced in OLMo 2 — OLMo is a family of fully-open language models — for mid-training, and the Tülu 3 post-training recipe.

It has 7 billion (B) parameters but uses only 1B per input token, according to a research paper on the model. It was trained on 5 trillion tokens and further adapted to create OLMoE-1B-7B-Instruct.

OLMoE is a sparse MoE model with 1 billion active and 7 billion total parameters, allowing it to run easily on common edge devices (e.g., the latest iPhone) while achieving similar or better MMLU performance when compared with much larger models,” wrote AI researcher Niklas Muennighoff for Contextual AI.

MMLU stands for Massive Multitask Language Understanding and evaluates a model’s ability to perform multiple tasks across a variety of subjects.

The result is a 4-bit quantized version optimized for mobile performance. Quantization is a technique in machine learning to reduce the precision of the numbers used to represent the model’s weights (parameters). It’s done to make the model smaller and faster to run. Four-bit quantization means each number is now represented using only 4 bits, which drastically reduces the model’s size and computational needs.

Device as Context

Soldaini added that device context is another important consideration for the model.

“Some applications rely on data that is only available locally, such as a user’s photo album or personal files,” Soldaini said. “If a developer is building an app that uses retrieval augmented generation (RAG) on data stored on device, it wouldn’t be practical to send those GBs to the cloud for processing.”

Instead, developers can use the OLMoE app as a starting point to do it all where the data is already stored, they added.

Latency and availability play a crucial role in user experience.

“On-device models that aren’t connected to the cloud can operate without delays caused by network communication and remain functional even in environments with poor or no connectivity,” they said. “For many simpler AI tasks, avoiding the round trip to the cloud can significantly improve responsiveness and reliability.”

The model can work offline, allowing developers to access AI at any time, the company explained. Users can choose to share data with back with Ai2 for research purposes, but do not have to.

“As on-device intelligence systems become more widely adopted, researchers and developers can integrate OLMoE into other iOS applications, or it can be used to experience which real-world tasks state-of-the-art on-device models are capable of,” Ai2 stated. “It can also be used to improve efficient local AI models or test one’s own model locally using Ai2’s open source codebase.”

TRENDING STORIES
Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site Bank Automation News. She has...
Read more from Loraine Lawson
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.