![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Pinecone announced Tuesday the next generation version of its serverless architecture, which the company says is designed to better support a wide variety of AI applications.
With the advent of AI, the cloud-based vector database provider has noticed a shift in how its databases are used, explained chief technology officer Ram Sriharsha. In a recent post announcing the architecture changes, Sriharsha said broader use of AI applications has led to a rise in demand for:
In short, Pinecone is trying to serve diverse and sometimes opposing customer needs. Among the differences is that retrieval-augmented generation (RAG) and agentic AI workflows tend to be more sporadic than semantic search, the company noted.
“They look very different from semantic search use cases,” Sriharsha told The New Stack. “In these emerging use cases, you see that actual workloads are very spiky, so it’s the opposite of predictable workload.”
Also, the corpus of information might be actually quite small — from a few documents to a few hundred documents. Even larger loads are broken up into what Pinecone calls “namespaces” or “tenants.” Within each tenant, the number of documents might be small, he said.
That requires a very different sort of system to be able to serve that cost effectively, he added.
About four years ago, Pinecone began to ship the public version of its vector database in a pod-based architecture.
A pod-based architecture is a way of organizing computing resources where a “pod” is a group of dedicated computers tightly linked together to function as a single unit. It’s often used for cloud computing, high-performance computing (HPC), and other scenarios where scalability and resource management are the primary concerns.
That worked because traditionally, recommender systems used a “build once and serve many” form of indexing, Sriharsha explained.
“Often, vector indexes for recommender workloads would be built in batch mode, taking hours,” he wrote in the blog. “This means such indexes will be hours stale, but it also allows for heavy optimization of the serving index since it can be treated as static.”
Semantic search workloads bring different requirements, he continued. They generally have a larger corpus and require predictable low latency — even though their throughput isn’t very high. They tend to heavily use metadata filters and their workloads care more about freshness, which is whether the database indexes reflect the most recent inserts and deletes.
Agentic workloads are different still, with a small to moderate sized corpora of fewer than a million vectors, but lots of namespaces or tenants.
He noted that customers running agentic workloads want:
Supporting that requires a serverless architecture, Sriharsha said.
“That has been highly successful for these RAG and agentic use cases and so on, and it’s driven a lot of cost savings to customers, and it’s also allowed people to run things at large scale in a way that they couldn’t do before,” he said.
But now Pinecone was supporting two systems: The pod-based architecture and the serverless architecture. The cloud-provider began to look at how it could converge the two in a way that offered customers the best of both.
”They still don’t want to have to deal with sizing all these systems and all of this complexity, so they can benefit from all the niceties of serverless, but they need something that allows them to do massive scale workloads,” Sriharsha said. “That meant we had to figure out how to converge pod architecture into serverless and have all the benefits of serverless, but at the same time do something that allows people to run these very different sort of workloads.”
Tuesday’s announcement was the culmination of months of work to create one architecture to serve all needs.
This next-generation approach allows Pinecone to support cost-effective scaling to 1000+ QPS through provisioned read capacity, high performance sparse indexing for higher retrieval quality, and millions of namespaces per index to support massively multitenant use cases.
Image via Ram Sriharsha’s blog post
It involves the following key innovations to Pinecone’s vector databases, according to Sriharsha’s post: