VOOZH about

URL: https://thenewstack.io/metas-multiray-a-ml-platform-for-running-large-foundational-models/

⇱ Meta's MultiRay, a ML Platform for Running Large Foundational Models - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-11-18 10:27:16
Meta's MultiRay, a ML Platform for Running Large Foundational Models
AI / Software Development

Meta’s MultiRay, a ML Platform for Running Large Foundational Models

Developed to make its AI systems more efficient, Meta's MultiRay uses large universal, foundational ML models trained to perform well across a diverse set of tasks.
Nov 18th, 2022 10:27am by Jessica Wachtel
👁 Featued image for: Meta’s MultiRay, a ML Platform for Running Large Foundational Models

Training an AI model is no small feat. Specialized teams need a huge amount of data to train a large model to do a very specific task, say reading millions of posts to learn how to identify harmful speech. Helpful, incredibly so. But this task is also expensive and limited in scope. With the costs associated with training each model, it’s easy to see how this can spiral out of control. This leads to enormous costs might keep the most state-of-the-art AI models out of production-level code.

There’s no way to get around the expensive data-heavy computations of understanding content with AI. The machine has to learn. But where the learning takes place and how the learning takes place could change. Social media conglomeration Meta has developed a new platform for running state-of-the-art AI models that does just that. MultiRay’s primary aim is to democratize access to large foundational models at Meta.

Developed as part of Meta’s push to make its AI systems more efficient, MultiRay uses large universal, foundational ML models which are trained to perform well across a diverse set of tasks and domains. The foundational models are optimized for functionality across a variety of tasks, including similarity and classification. Multiple specialized, smaller models can now run off of the input (also known as embedding) from the universal model.

With the bulk of the computations more centralized, Meta was able to purchase more cutting-edge accelerators (specialized hardware) needed for the expensive computations. Software development is also benefiting as development teams can now quickly iterate and improve upon ML models.

Currently, MultiRay powers over 125 use cases across Meta and it supports up to 20 million queries per second (QPS) while serving 800 billion queries per day.

MultiRay’s Modalities

MultiRay’s first model (in production since 2020), TextRay, focuses on text-understanding applications and can perform tasks ranging from detecting inauthentic content to improving users’ search experiences.

Building off of TextRay, the second model, PostRay, joins text and image understanding because to truly understand a post, which can include images, video, and text, a system needs to have the capacity to analyze each individually and within the context of one another.

Before PostRay, this functionality of portray required combining several different models together and consumed too many compute and power resources to actually bring the ML models into production.

PostRay models are complex to train, deploy, and maintain because they incorporate advanced research in multiple fields but only need training once. It has several use cases across Meta, including topic classification which is used for Reels.

How MultiRay Works

MultiRay centralizes execution on accelerations and uses a cache to save on recomputation costs.

MultiRay’s large foundational models return a point in a high-dimensional vector space that represents the input. The point is the “embedding” and it’s more ML-friendly version of the original input. Rather than processing the raw input — the text and images — task-specific models can consume the embedding from MultiRay which is simpler to handle.

The embeddings are huge, much larger than the inputs themselves (many kilobytes).

Why Centralize?

Software perspective The blog post generalized the upper bound with the smaller, individual team workflow as the burden of creating, maintaining, and upkeeping individual models as well as difficulties with applying sophisticated optimization techniques. The centralized workflow alleviates most of that with teams able to focus just on developing and iterating on task-specific models.

Hardware perspective Large models and latency constraints are very demanding on graphics processing units (GPUs) which are the accelerators used for MultiRay. The centralized model allows for top-shelf GPUs to be shared across the teams rather than for multiple teams to have their own GPUs.

MultiRay’s Cache

The multilayered cache trades hit rate at the cost of speed for each layer. The layers start from a fast but small per-host local cache in the RAM of every MultiRay server and end with a slower but much larger globally distributed cache in flash memory. Cache storage is finite thus it’s not possible to store cache results for a long time.

MultiRay measures request patterns across clients to determine the best cache settings (size, time-to-live, update policies) to reduce the cost of the service. For example, Meta uses the measured data to simulate the energy required for various cache lifetime settings trading the cost of re-computation of a request on accelerations versus serving it from the cache. This feedback loop allowed us to improve the efficiency of MultiRay even while client behavior constantly changes.

The Challenges of a Centralized Service

Some of the challenges already solved for large-scale systems (ie databases) such as client management, quotas, and cost attribution had to be adapted for the AI domain. Query size and cache hit rate both affect the energy required to process queries so quota are more complex. Another challenge is that the expenses accrued during the builds of these models only make sense if the models are used. This is a moving target that undergoing continuous innovation in new model architectures, heavy investment in model refresh, and training flows.

Additional Learning

MultiRay has become a sandbox for Meta’s ML and systems specialists to contribute key optimizations that support the broader PyTorch and accelerator ecosystem. MultiRay was the first large use case to deploy PyTorch’s Better Transformer in production at meta. This brought significant capacity savings with no impact on quality.

The research below is from Meta’s Foundational AI Research (FAIR) team which led to its development.

TRENDING STORIES
Jessica Wachtel is a developer marketing writer at InfluxData where she creates content that helps make the world of time series data more understandable and accessible. Jessica has a background in software development and technical journalism.
Read more from Jessica Wachtel
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.