VOOZH about

URL: https://thenewstack.io/the-rise-of-small-language-models/

⇱ The Rise of Small Language Models (SLMs) - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-02-16 03:00:53
The Rise of Small Language Models (SLMs)
AI / Large Language Models

The Rise of Small Language Models (SLMs)

As language models evolve to become more versatile and powerful, it seems that going small may be the best way to go.
Feb 16th, 2024 3:00am by Kimberley Mok
👁 Featued image for: The Rise of Small Language Models (SLMs)
This article has been updated from when it was originally published on February 14, 2024.

The impressive power of large language models (LLMs) has evolved substantially during the last couple of years. These versatile AI-powered tools are in fact deep learning artificial neural networks that are trained with massively large datasets, capable of leveraging billions of parameters (or machine learning variables) in order to perform various natural language processing (NLP) tasks.

These can run the gamut from generating, analyzing and classifying text, all the way to generating rather convincing images from a text prompt, to translating content into different languages, or chatbots that can hold human-like conversations. Well-known LLMs include proprietary models like OpenAI’s GPT-4, as well as a growing roster of open source contenders like Meta’s LLaMA.

But despite their considerable capabilities, LLMs can nevertheless present some significant disadvantages. Their sheer size often means that they require hefty computational resources and energy to run, which can preclude them from being used by smaller organizations that might not have the deep pockets to bankroll such operations. With larger models there is also the risk of algorithmic bias being introduced via datasets that are not sufficiently diverse, leading to faulty or inaccurate outputs — or the dreaded “hallucination” as it’s called in the industry.

What Are Small Language Models?

These issues might be one of the many that are behind the recent rise of small language models, or SLMs.

Small language models are slimmed-down versions of their larger cousins, and for smaller enterprises with tighter budgets, SLMs are becoming a more attractive option, because they are generally easier to train, fine-tune and deploy, and also cheaper to run.

How Small Language Models Stack Up Next to LLMs

Small language models are essentially more streamlined versions of LLMs, in regards to the size of their neural networks, and simpler architectures.

Compared to LLMs, SLMs have fewer parameters and don’t need as much data and time to be trained — think minutes or a few hours of training time, versus many hours to even days to train a LLM. Because of their smaller size, SLMs are therefore generally more efficient and more straightforward to implement on-site, or on smaller devices.

How Small Language Models Work

Similar to their larger cousins, small language models utilize a type of deep learning neural network architecture known as the transformer model. Introduced by Google researchers back in 2017 via a paper titled Attention Is All You Need, transformers have revolutionized natural language processing (NLP) during the last few years, paving the way for the generative pre-trained transformers (GPTs) that underlie some of today’s most massive and powerful large language models.

Generally, these are the basic building blocks of the transformer model architecture:

  • Encoder: This component processes and transforms input tokens into a number-based representation that is called an embedding, which captures the context of each token relative to the entire sequence.
  • Self-attention mechanism: This part gives the model the ability to ‘focus’ their attention on the most important parts of a sequence. This allows the model to weigh the relative importance of different parts of an input sequence, and to dynamically alter their influence on the resulting output, depending on the context.
  • Decoder: This element leverages the embeddings created by the encoder, and the self-attention mechanism to generate an output.

How Small Language Models Are Created

Small language models are typically made from large language models using an approach called model compression, which results in smaller models that are more resource-efficient and performant, yet still relatively accurate.

Some techniques of model compression include:

  • Knowledge distillation: Think of this technique as having the LLM function as a “teacher” that condenses and transfers its learned knowledge into a smaller, “student” model. The result is a smaller language model that has much of the accuracy and reasoning capabilities as it larger “teacher”, but without the computational cost it would take to run a larger model.
  • Pruning: Like pruning a plant so that it grows optimally, this method trims back any redundant parameters that aren’t crucial to performance, thus reducing the model size. However, pruned models will likely need to be fine-tuned afterward in order to compensate for any lost accuracy.
  • Quantization: This technique aims to shrink a model by utilizing fewer bits to store the model’s data, by converting high-precision data into lower-precision data. For example, numbers can be stored as 8-bit values, rather than 32-bit values. With this conversion, models can become smaller and will run faster (especially on smaller devices), but without negatively impacting accuracy. Quantization can be done either during model training (quantization-aware training), or after training (post-training quantization).
  • Low-rank factorization: This method identifies any redundant parameters of a deep neural network by “decomposing” a larger matrix of weights into smaller one, thus helping to simplify the model’s operational needs. This helps to reduce the size of the model so that it runs faster, but the process of low-rank factorization itself can require more computational resources to implement. Additionally, fine-tuning is often required to make up for any reduction in accuracy. Low-rank factorization can be done during training — which can help to reduce training time — or it can be done after training.

Benefits and Limitations of Small Language Models

  • Practical and easier to customize: Because SLMs can be tailored to more narrow and specific applications, that makes them more practical for companies that require a language model that is trained on more limited datasets, and can be fine-tuned for a particular domain.
  • Enhanced security and privacy: Additionally, SLMs can be customized to meet an organization’s specific requirements for security and privacy. Thanks to their smaller codebases, the relative simplicity of SLMs also reduces their vulnerability to malicious attacks by minimizing potential surfaces for security breaches.
  • Potential for reduced performance: On the flip side, the increased efficiency and agility of SLMs may translate to slightly reduced language processing abilities, depending on the benchmarks the model is being measured against.

Examples of Small Language Models

Nevertheless, despite some of these potential limitations, some SLMs like Microsoft’s recently introduced 2.7 billion-parameter Phi-2, demonstrate state-of-the-art performance in mathematical reasoning, common sense, language understanding, and logical reasoning that is remarkably comparable to — and in some cases, exceed — that of much heftier LLMs. According to Microsoft, the efficiency of the transformer-based Phi-2 makes it an ideal choice for researchers who want to improve safety, interpretability and ethical development of AI models.

Other SLMs of note include:

  • DistilBERT: a lighter and faster version of Google’s BERT (Bidirectional Encoder Representations Transformer), the pioneering deep learning NLP AI model introduced back in 2018. There are also Mini, Small, Medium and Tiny versions of BERT, which are scaled-down and optimized for varying constraints, and range in size from 4.4 million parameters in the Mini, 14.5 million in the Tiny, to 41 million parameters in the Medium version. There is also MobileBERT, a version designed for mobile devices.
  • Orca 2: Developed by Microsoft by fine-tuning Meta’s LLaMA 2 by using synthetic data that is generated from a statistical model, rather than from real life. This results in enhanced reasoning abilities, and higher performance in reasoning, reading comprehension, math problem solving and text summarization that can overtake that of larger models that are ten times larger.
  • GPT-Neo and GPT-J: With 125 million and 6 billion parameters respectively, these alternatives were designed by the open source AI research consortium EleutherAI to be smaller and open source versions of OpenAI’s GPT model. These SLMs can be run on cheaper cloud computing resources from CoreWeave and TensorFlow Research Cloud.

Use Cases for Small Language Models

Because of their smaller size, and reduced computational and operational cost, businesses and institutions can easily fine-tune and tailor small language models to a specific use.

For instance, SLMs could be used as chatbots to offer timely customer service, or utilized to summarize content or create calendar events for users. These smaller models could also be used to translate foreign languages in real-time, generate programming code, or to monitor or perform preventative maintenance on devices linked to the Internet of Things (IoT). Within automotive systems, SLMs can go a long way in offering real-time traffic updates for smarter road navigation, or improving voice commands or handsfree calling.

The Future Ahead for Small Language Models

Ultimately, the emergence of small language models signals a potential shift from expensive and resource-heavy LLMs to more streamlined and efficient language models, arguably making it easier for more businesses and organizations to adopt and tailor generative AI technology to their specific needs. As language models evolve to become more versatile and powerful, it seems that going small may be the best way to go.

TRENDING STORIES
Kimberley Mok is a tech and design reporter who covers artificial intelligence, robotics, quantum computing, tech culture and science stories for The New Stack. Trained as an architect, she is also an illustrator and multidisciplinary designer who has been passionate...
Read more from Kimberley Mok
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.