VOOZH about

URL: https://thenewstack.io/recurrentgemma-an-open-language-model-for-smaller-devices/

⇱ RecurrentGemma: An Open Language Model For Smaller Devices - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-01 12:30:18
RecurrentGemma: An Open Language Model For Smaller Devices
research,
AI / Large Language Models

RecurrentGemma: An Open Language Model For Smaller Devices

New research shows that Google's smaller RecurrentGemma model has about the same level of performance as larger LLMs, including Google's.
May 1st, 2024 12:30pm by Kimberley Mok
👁 Featued image for: RecurrentGemma: An Open Language Model For Smaller Devices
Image via Unsplash+.

Large language models (LLMs) have been making a huge impact during the last couple of years, in particular with the emergence of tools like OpenAI’s ChatGPT. However, the mammoth size of LLMs — many of which are now trained on billions (and sometimes trillions) of machine learning variables — makes them too computationally heavy for devices like personal computers, smartphones and other smart devices.

These constraints might explain the growing interest in small language models (SLMs), as well as open LLMs like Google’s RecurrentGemma-2B, which was released a few weeks ago.

Based off Google’s novel Griffin architecture, RecurrentGemma is a more efficient and streamlined, 2-billion-parameter version of the company’s line of open Gemma AI models. This makes it an excellent choice for applications that require real-time processing, like translation or interactive AI use cases.

Building on Recurrent Neural Networks

Most significantly, the underlying model architecture of RecurrentGemma isn’t based on what is called transformer architecture, as most LLMs like GPT-4 and BERT are.

A transformer is a type of deep learning model that is designed to process sequential data contextually, in order to handle text-based tasks like translation and summarization.

RecurrentGemma is not built on transformers; rather, it is built on linear recurrences.

Transformers have revolutionized the field of natural language processing (NLP), but arguably their biggest drawback is that they are designed to parse each new piece of information in parallel, which translates to heavier requirements for memory and computational power. This means that most transformer-based LLMs typically are too resource-hungry for most small devices, like smartphones.

In contrast, RecurrentGemma is built on what is known as linear recurrences, a vital component to recurrent neural networks (RNNs), as explained in the research team’s preprint paper.

In a recent blog post, Google explained that “RecurrentGemma is a technically distinct model that leverages recurrent neural networks and local attention to improve memory efficiency. While achieving similar benchmark score performance to the Gemma 2B model, RecurrentGemma’s unique architecture results in several advantages, [including] reduced memory usage, higher throughput, and research innovation.”

Hidden States and Local Attention

Before the emergence of transformers, RNNs were typically used to process sequential data by utilizing a “hidden state” that is continuously updated as data is processed. This hidden state is combined with a “local attention” mechanism that allows the model to recall information earlier in a sequence, without having to recall all the hidden states at each step of the process (as a “global attention” mechanism would require).

“Although one can reduce the cache size by using local attention, this comes at the price of reduced performance,” noted the team in their paper. “In contrast, RecurrentGemma-2B compresses input sequences into a fixed-size state without sacrificing performance. This reduces memory use and enables efficient inference on long sequences.”

Because resource usage is fixed, RecurrentGemma is able to handle lengthier language processing tasks efficiently, even with the typical computational constraints of personal devices, and without having to rely on powerful GPUs or cloud-based computing.

RecurrentGemma-2B is capable of achieving about the same level of performance as other larger Gemma models.

Despite the lack of a transformer-based architecture, the research team found that RecurrentGemma performed well in a variety of tests when compared to larger LLMs, including those of the Gemma family of models.

According to the team’s findings, RecurrentGemma-2B-IT (IT meaning an instruction-tuned model) achieved a 43.7% win rate against the larger Mistral 7B model in hundreds of prompts that included creative writing and coding tasks. This result was also only slightly below the 45% win rate achieved by Gemma-1.1-2B-IT in the same set of tasks.

Additionally, the researchers found that RecurrentGemma-2B-IT outperformed a Mistral 7B v0.2 Instruct model, with a 59.8% win rate on 400 prompts testing out basic security protocols.

Overall, the team found that a both a pre-trained RecurrentGemma model with 2 billion non-embedding parameters and an instruction-tuned variant achieved comparable performance to Gemma-2B, even though Gemma-2B was trained on 50% more tokens.

In the end, the team notes that RecurrentGemma-2B is capable of achieving about the same level of performance as other larger Gemma models, by leveraging the advantages of RNNs and local attention mechanisms — making it much more efficient and suitable for deployment in situations where resources are constrained.

Ultimately, models like RecurrentGemma-2B could signal a shift to smaller and more agile AI models that can be run on less powerful devices.

TRENDING STORIES
Kimberley Mok is a tech and design reporter who covers artificial intelligence, robotics, quantum computing, tech culture and science stories for The New Stack. Trained as an architect, she is also an illustrator and multidisciplinary designer who has been passionate...
Read more from Kimberley Mok
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.