VOOZH about

URL: https://thenewstack.io/architecture-inversion-scale-by-moving-computation-not-data/

⇱ Architecture Inversion: Scale by Moving Computation, Not Data - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-10-21 12:00:54
Architecture Inversion: Scale by Moving Computation, Not Data
sponsor-vespa-ai,sponsored-post-contributed,
CI/CD / Data Streaming / Operations

Architecture Inversion: Scale by Moving Computation, Not Data

The biggest players’ scaling tricks are becoming increasingly relevant for the rest of us, which has led to the proliferation of architecture inversion.
Oct 21st, 2024 12:00pm by Jon Bratseth
👁 Featued image for: Architecture Inversion: Scale by Moving Computation, Not Data
Image from Sergey Novikov on Shutterstock.
Vespa.ai sponsored this post.

Have you ever wondered how the world’s largest internet and social-media companies can deliver algorithmic content to so many users so fast?

Consider what the likes of TikTok need to do to provide people with an endless stream of personalized video clips to people’s phones. They have some model representing the user, and they need to use this to find the most suitable clips to show to that particular user among billions of alternatives. And since they also have billions of users, they need to do this millions of times per second.

Traditional Solutions

The naive way to solve TikTok’s problem is to compare the user model to every video clip to determine how well each one fits that user. It is widely understood that this brute-force approach doesn’t scale — with a billion videos and a million requests per second, this becomes a quadrillion comparisons per second!

The obvious solution to this is indexing: maintain a data structure that makes it possible to find suitable video clips from the user model without having to consider every clip. For example, if the user model notes a preference for English-speaking videos, the videos can be indexed with a B-tree that points directly to English videos so the rest can be ignored. Or, if the user is represented as an interest vector embedding, a vector index such as the Hierarchical Navigable Small World (HNSW) algorithm can be used to find videos with similar vectors without considering the rest.

Real systems will use a combination of such indexes. Now, the indexes only give a rough indication of what videos may be suitable to the user. To really surface the content users find most interesting or useful, you need to do a more accurate comparison between the user model and each candidate item — these days often done using neural nets. Here is where it gets interesting.

Scaling Without Compromising Quality

The common way to rescore is to pass the candidate items retrieved from the indexes to another component in your architecture doing the detailed scoring of each. How many should be rescored in this way? This should be a certain fraction of all the candidates.

To see this, consider that indexed retrieval plus rescoring is an approximation to brute-force scoring of all candidates, and what we need to consider is the quality loss from this optimization. This can be expressed in terms of the probability that a given video that would be shown to the user with brute-force evaluation is present in the set to be reranked.

This probability goes toward zero as the size of that set relative to the full set of candidates gets smaller. The quality loss will get larger as the fraction to be rescored decreases, and it also gets larger the better your full scoring algorithm becomes as there is more to lose.

Let’s get concrete and say we want to rescore 1% of the candidates, and that each item contains 2kb of data useful for final scoring (roughly one vector and hundred properties). With a billion items this becomes 10 million items to rescore per request, and with a million requests per second that means we need to move 20 petabytes of data per second for reranking! Even this small fraction is clearly very far from being viable, so what are the big players doing?

The answer is that they are not moving the data to the scoring compute nodes, but instead are moving the scoring compute into the index to be done locally where the data resides, thus circumventing the entire problem.

The Architecture Inversion Is Coming to the Rest of Us

Now why should the rest of us care, blessed as we are with a lack of most of the billions of users TikTok, Google and the likes are burdened with? A number of factors are becoming relevant:

  • ML algorithms are improving and so is local compute capacity, meaning fully scoring items gives a larger boost in quality and ultimately profit than used to be the case.
  • With the advent of vector embeddings, the signals consumed by such algorithms have grown by one to two orders of magnitude, making the network bottleneck more severe.
  • Applying ever more data to solve problems is increasingly cost effective, which means more data needs to be rescored to maintain a constant quality loss.
  • As the consumers of data from such systems move from being mostly humans to mostly LLMs in RAG solutions, it becomes beneficial to deliver larger amounts of scored data faster in more applications than before. This will culminate in most applications being about delivering high-quality data to LLMs to reason in long chains to make high-quality business decisions at an inhumanely fast pace.

For these reasons, the scaling tricks of the very biggest players are becoming increasingly relevant for the rest of us, which has led to the current proliferation of architecture inversion, going from traditional two-tier systems where data is looked up from a search engine or database and sent to a stateless compute tier to inserting that compute into the data itself.

Now, to really do this, you also need a platform that can actually manage your data, indexing and compute in this way. This has led to the increasing popularity of Vespa.ai, the platform that got its start as Yahoo’s solution for architecture inversion back when it was one of the big players. The technology has since been open sourced.

Vespa.ai allows you to store and index structured data, vectors/tensors and full text together over any number of machines and do any kind of tensor computation and machine-learned inference locally where the data is stored.

Vespa.ai is a platform for building AI-driven applications for search, recommendation, personalization, and RAG. It handles large data volumes and high query rates, offering efficient data, inference, and logic management. Available as both a managed service and open source.
Learn More
The latest from Vespa.ai
Hear more from our sponsor
TRENDING STORIES
Jon Bratseth, founder and CEO of Vespa.ai, is the architect and one of the main contributors to Vespa. Jon has 20+ years of experience as an architect and programmer on large distributed systems.
Read more from Jon Bratseth
Vespa.ai sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.