VOOZH about

URL: https://thenewstack.io/vector-primer-understand-the-lingua-franca-of-generative-ai/

⇱ Vector Primer: Understand the Lingua Franca of Generative AI - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-06-30 06:26:45
Vector Primer: Understand the Lingua Franca of Generative AI
sponsor-datastax,sponsored-post-contributed,
AI / Data / Software Development

Vector Primer: Understand the Lingua Franca of Generative AI

Let’s dig into vectors, vector search and the kinds of databases that can store and query vectors.
Jun 30th, 2023 6:26am by Charna Parkey
👁 Featued image for: Vector Primer: Understand the Lingua Franca of Generative AI
AI generated art
DataStax sponsored this post.

We’re fond of saying that there’s no artificial intelligence without data. But it can’t be any kind of data. Take large language models, or LLMs — deep learning models, like OpenAI’s GPT-4 that can generate text that’s quite similar to what a human would write.

For LLMs to “understand” words, they need to be stored as text “vectors” — a way of using numbers to capture the meanings and usage patterns of words. Vectors are, you might say, the lingua franca of AI.

Vectors have been around for a while, but with the popularity and accessibility of the generative AI interface ChatGPT, they’ve become a hot topic, particularly because the most popular apps that organizations will build with these technologies will leverage their own private data for LLMs by composing their own vectors.

But how do they work, how are they stored, how do applications search for them and how do they help make AI possible? Let’s dig into vectors, vector search and the kinds of databases that can store and query vectors.

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax

Vectors

A vector refers to a numeric representation of the attributes of a piece of data. Each data point is represented as a vector with many numerical values, where each value corresponds to a specific feature or attribute of the data.

When you transform data like an image or text into a vector representation, it’s known as “embedding.” The choice of image embeddings for vector search, for example, depends on various factors such as the specific use case, the available resources and the characteristics of the image dataset. In e-commerce or product image search applications, it can be beneficial to use embeddings specifically trained on product images; so-called instance retrieval, on the other hand, involves searching for instances of objects within a larger scene or images.

Storing data as vector representations enables you to perform various operations and calculations on the data, most importantly search. Selecting the vector attributes is important for the types of questions you’d like to be able to ask later. For example, if you only store information about the colors in an image with plants, you can’t then ask about the care requirements. You’ll only be able to find visually similar plants.

Vector Search

By representing data as vectors, you can leverage mathematical techniques to efficiently search and compare very big datasets without having an exact match. Millions of customer profiles or images or articles that are represented as vectors — a list of numbers that capture each item’s key characteristics — can be combed through very quickly with vector similarity search (or “nearest neighbor search”).

Unlike traditional keyword-based search, which matches documents based on the occurrence of specific terms, vector search focuses on the similarity of queries; for instance, are their semantic meanings similar?

This capability enables finding similar items based on their vector representations. Similarity search algorithms can measure the “distance” or similarity between vectors to determine how closely related they are.

In recommendation systems, vector search can be used to find the most similar and dissimilar items or users based on their preferences. In image processing, it enables tasks like object recognition and image retrieval. For instance, Google, the world’s largest search engine, relies on vector search to power the backend of Google Image Search, YouTube and other information retrieval services.

Vectors and Databases

There are stand-alone vector search technologies, including the likes of Elasticsearch. But vectors need to be stored in and retrieved from scalable and fast databases to deliver the responsiveness and scale demanded by AI applications. There are a handful of databases today that offer vector search as a feature.

The main advantage of a database that enables vector search is speed. Traditional databases have to compare a query to every item in the database. In contrast, integrated vector search enables a form of indexing and includes search algorithms that vastly speed up the process, making it possible to search massive amounts of data in a fraction of the time it would take a standard database.

In a business context, this is extremely valuable when using AI applications to recommend products that are similar to past purchases, or identify fraudulent transactions that resemble known patterns, or anomalies that look dissimilar to the norm.

One example of a database that offers vector search is DataStax’s Astra DB, which is built on the highly scalable, high-throughput, open source Apache Cassandra. Cassandra has already been proven at scale to power AI by the likes of Netflix, Uber and Apple for AI applications. The addition of vector search makes Astra DB a one-stop shop for high-scale database operations.

Integrating vector search with a scalable data store like Astra DB enables calculations and ranking directly within the database, eliminating the need to transfer large amounts of data to external systems. This reduces latency and improves overall query performance. Vector search can be combined with other indexes within Astra DB for even more powerful queries.

The Growing Importance of Vector Search

Vectors and the databases that store them play a big role in enabling efficient search, similarity calculations and data exploration in the field of AI. As organizations scale their generative AI efforts and look to customize the end-user experience with their data, vector representations and the ability to work with scalable, fast databases that are vector-search enabled will become increasingly critical.

Learn more about vector search at Agent X: Architecture for Generative AI, a free virtual event on July 11. Register now

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax
TRENDING STORIES
Charna Parkey is a real-time AI product and strategy leader at DataStax through the acquisition of Kaskada where she was most recently vice president of product. She is an experienced serial startup tech executive, product builder, engineer, speaker, writer, mentor...
Read more from Charna Parkey
DataStax sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.