VOOZH about

URL: https://thenewstack.io/pinecone-revamps-vector-database-architecture-for-ai-apps/

⇱ Pinecone Revamps Vector Database Architecture for AI Apps - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-02-27 11:00:58
Pinecone Revamps Vector Database Architecture for AI Apps
AI Operations / Databases / Serverless

Pinecone Revamps Vector Database Architecture for AI Apps

The company announced the next generation version of its serverless architecture, which is designed to better support a wide variety of AI applications.
Feb 27th, 2025 11:00am by Loraine Lawson
👁 Featued image for: Pinecone Revamps Vector Database Architecture for AI Apps
Photo by Erika Fletcher on Unsplash.

Pinecone announced Tuesday the next generation version of its serverless architecture, which the company says is designed to better support a wide variety of AI applications.

With the advent of AI, the cloud-based vector database provider has noticed a shift in how its databases are used, explained chief technology officer Ram Sriharsha. In a recent post announcing the architecture changes, Sriharsha said broader use of AI applications has led to a rise in demand for:

  • Recommender systems requiring 1000s of queries per second;
  • Semantic search across billions of documents; and
  • AI agentic systems that require millions of independent agents operating simultaneously.

In short, Pinecone is trying to serve diverse and sometimes opposing customer needs. Among the differences is that retrieval-augmented generation (RAG) and agentic AI workflows tend to be more sporadic than semantic search, the company noted.

“They look very different from semantic search use cases,” Sriharsha told The New Stack. “In these emerging use cases, you see that actual workloads are very spiky, so it’s the opposite of predictable workload.”

Also, the corpus of information might be actually quite small — from a few documents to a few hundred documents. Even larger loads are broken up into what Pinecone calls “namespaces” or “tenants.” Within each tenant, the number of documents might be small, he said.

That requires a very different sort of system to be able to serve that cost effectively, he added.

A Pod-Based Architecture

About four years ago, Pinecone began to ship the public version of its vector database in a pod-based architecture.

A pod-based architecture is a way of organizing computing resources where a “pod” is a group of dedicated computers tightly linked together to function as a single unit. It’s often used for cloud computing, high-performance computing (HPC), and other scenarios where scalability and resource management are the primary concerns.

That worked because traditionally, recommender systems used a “build once and serve many” form of indexing, Sriharsha explained.

“Often, vector indexes for recommender workloads would be built in batch mode, taking hours,” he wrote in the blog. “This means such indexes will be hours stale, but it also allows for heavy optimization of the serving index since it can be treated as static.”

Serverless Architecture

Semantic search workloads bring different requirements, he continued. They generally have a larger corpus and require predictable low latency — even though their throughput isn’t very high. They tend to heavily use metadata filters and their workloads care more about freshness, which is whether the database indexes reflect the most recent inserts and deletes.

Agentic workloads are different still, with a small to moderate sized corpora of fewer than a million vectors, but lots of namespaces or tenants.

He noted that customers running agentic workloads want:

  • Highly-accurate vector search out of the box without becoming vector search experts;
  • Freshness, elasticity, and the ability to ingest data without hitting system limits, resharding, and resizing; and
  • Predictable, low latencies.

Supporting that requires a serverless architecture, Sriharsha said.

“That has been highly successful for these RAG and agentic use cases and so on, and it’s driven a lot of cost savings to customers, and it’s also allowed people to run things at large scale in a way that they couldn’t do before,” he said.

Convergence on One Approach

But now Pinecone was supporting two systems: The pod-based architecture and the serverless architecture. The cloud-provider began to look at how it could converge the two in a way that offered customers the best of both.

”They still don’t want to have to deal with sizing all these systems and all of this complexity, so they can benefit from all the niceties of serverless, but they need something that allows them to do massive scale workloads,” Sriharsha said. “That meant we had to figure out how to converge pod architecture into serverless and have all the benefits of serverless, but at the same time do something that allows people to run these very different sort of workloads.”

Tuesday’s announcement was the culmination of months of work to create one architecture to serve all needs.

This next-generation approach allows Pinecone to support cost-effective scaling to 1000+ QPS through provisioned read capacity, high performance sparse indexing for higher retrieval quality, and millions of namespaces per index to support massively multitenant use cases.

It involves the following key innovations to Pinecone’s vector databases, according to Sriharsha’s post:

  • Log structured Indexing. Log-structured indexing (LSI) is a data storage technique that prioritizes write speed and efficiency that Pinecone has adapted and applied to their vector database;
  • A new freshness approach that routes all reads through the memtable (an in-memory structure that holds the most recently written data);
  • Predictable caching in which the index portion of the file, (Pinecone calls these slabs), is always cached between local SSD and memory, which enables Pinecone “to serve queries immediately, without having to wait for a warm up period for cold queries”;
  • Cost-effective at high QPS; and
  • Disk-based Metadata Filtering, which is another new feature in this update of Pinecone’s serverless architecture.
TRENDING STORIES
Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site Bank Automation News. She has...
Read more from Loraine Lawson
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.