VOOZH about

URL: https://thenewstack.io/google-wants-developers-to-build-on-device-ai-applications/

⇱ Google Wants Developers to Build On-Device AI Applications - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-20 07:36:53
Google Wants Developers to Build On-Device AI Applications
AI / Large Language Models / Software Development

Google Wants Developers to Build On-Device AI Applications

Today's phones are equipped with hardware to directly run AI on devices; and Google is encouraging coders to take advantage of it.
May 20th, 2024 7:36am by Agam Shah
👁 Featued image for: Google Wants Developers to Build On-Device AI Applications
Image via Google.

Today’s phones and PCs are equipped with new hardware to directly run AI on devices; and at Google I/O this year, Google encouraged coders to take advantage of it.

The idea is to run large language models on locally stored data, even without an internet connection. The data remains private, does not leave the device, and the approach saves money.

“As a developer, you reduce or eliminate the need to deal with server-side maintenance, capacity, constraints or cost for another entrance,” said Sachin Kotwani, group product manager, during a session at Google I/O.

The Way It Works

The ability to develop on-device AI applications is significant progress from the way AI processing is done today.

Neural processors in new phones and PCs make on-device AI possible.

AI already exists on devices, if you haven’t noticed. It runs basic smartphone activities, such as suggesting text messages, improving images, and analyzing power consumption to save battery.

Neural processors in new phones and PCs make on-device AI possible. But running LLMs with a billion or more parameters, such as TinyLlama or Phi-2 on PCs, without any AI accelerators is painfully slow. You can run LLMs only on CPUs with Jan.ai or GPT4All, but it will tax your computer.

Running LLMs rocks on PCs with powerful GPUs. But the setup is a chore — you need to download the models, load the neural network environment (such as Nvidia’s CuDNN), install developer tools and compile it.

A new wave of accelerators and GPUs capable of matrix math on-devices makes AI possible on mobile phones.

As a result, most of the AI happens in the cloud on powerful GPUs, which can be as simple as loading a GPT-4 API into a chatbot interface, which then offloads queries to GPUs in OpenAI’s server infrastructure. But these APIs aren’t free, and you must pay to use OpenAI’s infrastructure.

A new wave of accelerators and GPUs capable of matrix math on devices makes AI possible on mobile phones.

Google’s new Pixel 8A phone has an Edge TPU (Tensor processing unit) for AI, and Intel and AMD have neural processing units on PCs. The on-device AI can also be coupled with cloud-based AI resources.

Dev Tools

Development tools to run LLMs on devices are becoming available from chip makers that include AMD, Intel and Nvidia.

Most recently, Google talked about development kits, APIs, and other tools that leverage its own Gemini Nano LLM for mobile devices. This LLM is multimodal, which means developers can build speech, image, video or chatbot applications around it.

“Gemini Nano is Android’s recommended path to production.”
– Thomas Ezan, Google

Google reps said that Gemini Nano is the most capable model for on-device AI, and it also integrates well into Android apps.

“Gemini Nano is Android’s recommended path to production,” said Thomas Ezan, senior developer relations engineer at Google at I/O.

For those who prefer not to get stuck in Google’s proprietary AI development environment, Google will support open source LLMs between two to three billion parameters.

“If you want to run generic inference on devices, open large language models have also grown in popularity in the past year, although they are not a good fit for production due to performance and memory challenges,” Ezan said.

Those include Falcon 1B (1.3 billion parameters), Flan-T5 (2.7 billion parameters), StableLM 3B (2.8 billion parameters) and Llama 2B (2.5 billion parameters). Google will also support a 7-billion parameter model of its open source Gemma LLM.

Google’s Own Tools

Developers can integrate Nano AI into apps and development via the Edge AI SDK. The SDK provides high-level APIs, pipelines, model inference and hardware hooks to run AI models efficiently.

Mobile devices are constrained in computation power, bandwidth, and memory. Developers can fine-tune models by accessing a system service called AICore, which is integrated in Android 14 running on eligible devices such as Pixel 8A and Samsung’s S24.

Developers can optimize models for mobile devices using quantization to reduce model size and processing requirements.

LoRA is considered an important building block to fine-tuning AI to devices and applications.

“The context window will also likely be smaller and the model will be less generalized… this means that fine-tuning is critical in order to get production quality,” said Terence Zhang, a developer relations engineer at Google.

AICore also includes a fine-tuning layer called low-rank adaptation, LoRA, which allows app developers to customize a model to perform specific tasks. LoRA is considered an important building block to fine-tuning AI to devices and applications.

“Apps can train their own specialized LoRA fine-tuning blocks to optimize the performance of the Gemini Nano model,” said Miao Wang, software engineer at Google.

Supports Open Source LLMs

MediaPipe is a critical API that allows developers to create on-device AI applications using multiple open source LLMs, which include Falcon and Gemma.

Developers will rely on the MediaPipe API to write AI web apps for Android and iOS devices.

The MediaPipe API provides the pre-optimized models, and developers have to bring the weights to run on-device applications. It supports vision, text and audio applications. Some LLMs excel at specific tasks, and the API provides the flexibility for developers to select their models.

Developers will rely on the MediaPipe API to write AI web apps for Android and iOS devices. Chrome 126, which is in beta, integrates support for low-code APIs that connect web apps to the Nano and open source LLMs.

“This is all running fully locally in the browser, and it’s fast. And that’s because it’s accelerated on the computer’s GPU through WebGPU. And that makes it fast enough to build pretty compelling, fully local web applications,” said Cormac Brick, principal software engineer of core machine learning, at Google I/O.

TensorFlow Lite

Google is also using the TensorFlow Lite development environment, which is a lightweight version of the TensorFlow machine learning framework. TFLite also includes a kit to convert TensorFlow models into more compact versions that can run on-device.

“You can find off-the-shelf models or train models in the framework of your choice,” Brick said. “It converts your models to TensorFlow Lite with a single step. And then you can run them all on runtime bundles with your app across Android, web and iOS.”

Chip maker Qualcomm last week said that developers will be able to port their LLMs to smartphones using its latest chips.

Challenges

App developers are in a gold rush to take advantage of every last ounce of processing that they can to make their apps more efficient.

New generations of devices will have more AI horsepower, which will boost on-device AI brains.

Another challenge is to match apps to the right AI chips. New generations of devices will have more AI horsepower, which will boost on-device AI brains.

Dell has introduced new PCs with Intel’s NPU, but on-device AI will really take off once developers discover relevant apps, said Zach Noskey, director of product management at Dell.

Developer participation in tools such as Intel’s OpenVino is important to drive the industry. Vendors also need to work closely on application readiness with developers, who may not know where to start.

For example, OpenVino provides an Intel NPU plugin for Gimp to support Stability Diffusion image-generation prompts.

“It is about continuing to enable that in the community — it’s kind of going a little bit slower, like in years past with CPU and GPU utilization of applications,” Noskey said.

TRENDING STORIES
Agam Shah has covered enterprise IT for more than a decade. Outside of machine learning, hardware and chips, he's also interested in martial arts and Russia.
Read more from Agam Shah
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: run.ai, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.