VOOZH about

URL: https://thenewstack.io/use-your-data-in-llms-with-the-vector-database-you-already-have/

⇱ Use Your Data in LLMs With the Vector Database You Already Have - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-07-12 07:50:35
Use Your Data in LLMs With the Vector Database You Already Have
sponsor-instaclustr,sponsored-post-contributed,
AI / Data / Large Language Models / Open Source

Use Your Data in LLMs With the Vector Database You Already Have

Open source vector databases are among the top options out there for AI development, including some you may already be familiar with or even have on hand.
Jul 12th, 2024 7:50am by Chris Carter
👁 Featued image for: Use Your Data in LLMs With the Vector Database You Already Have
Image from whiteMocca on Shutterstock
Instaclustr sponsored this post.

Vector databases allow you to enhance your LLM models with data from your internal data stores. Prompting the LLM with local, factual knowledge can allow you to get responses tailored to what your organization already knows about the situation. This reduces “AI hallucination” and improves relevance.

You can even ask the LLM to add references to the original data it used in its answer so you can check yourself. No doubt vendors have reached out with proprietary vector database solutions, advertised as a “magic wand” enabling you to assuage any AI hallucination concerns.

But, ready for some good news?

If you’re already using Apache Cassandra 5.0, OpenSearch or PostgreSQL, your vector database success is already primed. That’s right: There’s no need for costly proprietary vector database offerings. If you’re not (yet) using these free and fully open source database technologies, your generative AI aspirations are a good time to migrate — they are all enterprise-ready and avoid the pitfalls of proprietary systems.

For many enterprises, these open source vector databases are the most direct route to implementing LLMs — and possibly leveraging retrieval augmented generation (RAG) — that deliver tailored and factual AI experiences.

Vector databases store embedding vectors, which are lists of numbers representing spatial coordinates corresponding to pieces of data. Related data will have closer coordinates, allowing LLMs to make sense of complex and unstructured datasets for features such as generative AI responses and search capabilities.

RAG, a process skyrocketing in popularity, involves using a vector database to translate the words in an enterprise’s documents into embeddings to provide highly efficient and accurate querying of that documentation via LLMs.

Let’s look closer at what each open source technology brings to the vector database discussion:

Apache Cassandra 5.0 Offers Native Vector Indexing

With its latest version (currently in preview), Apache Cassandra has added to its reputation as an especially highly available and scalable open source database by including everything that enterprises developing AI applications require.

Cassandra 5.0 adds native vector indexing and vector search, as well as a new vector data type for embedding vector storage and retrieval. The new version has also added specific Cassandra Query Language (CQL) functions that enable enterprises to easily use Cassandra as a vector database. These additions make Cassandra 5.0 a smart open source choice for supporting AI workloads and executing enterprise strategies around managing intelligent data.

OpenSearch Provides a Combination of Benefits

Like Cassandra, OpenSearch is another highly popular open source solution, one that many folks on the lookout for a vector database happen to already be using. OpenSearch offers a one-stop shop for search, analytics and vector database capabilities, while also providing exceptional nearest-neighbor search capabilities that support vector, lexical, and hybrid search and analytics.

With OpenSearch, teams can put the pedal down on developing AI applications, counting on the database to deliver the stability, high availability and minimal latency it’s known for, along with the scalability to account for vectors into the tens of billions. Whether developing a recommendation engine, generative AI agent or any other solution where the accuracy of results is crucial, those using OpenSearch to leverage vector embeddings and stamp out hallucinations won’t be disappointed.

The pgvector Extension Makes Postgres a Powerful Vector Store

Enterprises are no strangers to Postgres, which ranks among the most used databases in the world. Given that the database only needs the pgvector extension to become a particularly performant vector database, countless organizations are just a simple deployment away from harnessing an ideal infrastructure for handling their intelligent data.

pgvector is especially well-suited to provide exact nearest-neighbor search, approximate nearest-neighbor search and distance-based embedding search, and at using cosine distance (as recommended by OpenAI), L2 distance and inner product to recognize semantic similarities. Efficiency with those capabilities makes pgvector a powerful and proven open source option for training accurate LLMs and RAG implementations, while positioning teams to deliver trustworthy AI applications they can be proud of.

Was the Answer to Your AI Challenges in Front of You All Along?

The solution to tailored LLM responses isn’t investing in some expensive proprietary vector database and then trying to dodge the very real risks of vendor lock-in or a bad fit. At least it doesn’t have to be. Recognizing that available open source vector databases are among the top options out there for AI development — including some you may already be familiar with or even have on hand — should be a very welcome revelation.

Instaclustr delivers reliability at scale through an integrated data platform of open source technologies such as Apache Cassandra®, Apache Kafka®, Apache SparkTM, ElasticsearchTM, RedisTM, Apache ZooKeeperTM, and PostgreSQL®.
Learn More
The latest from Instaclustr
TRENDING STORIES
Chris Carter is a product manager at Instaclustr by NetApp, which provides a managed platform around open source data technologies. Chris joined Instaclustr in 2020 as an engineer after working in data science and engineering for Mars.
Read more from Chris Carter
Instaclustr sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.