VOOZH about

URL: https://thenewstack.io/the-building-blocks-of-llms-vectors-tokens-and-embeddings/

⇱ The Building Blocks of LLMs: Vectors, Tokens and Embeddings - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-02-08 07:32:56
The Building Blocks of LLMs: Vectors, Tokens and Embeddings
tutorial,
AI / Large Language Models

The Building Blocks of LLMs: Vectors, Tokens and Embeddings

Understanding vectors, tokens and embeddings is fundamental to grokking how large language models process language.
Feb 8th, 2024 7:32am by Janakiram MSV
👁 Featued image for: The Building Blocks of LLMs: Vectors, Tokens and Embeddings
Photo by La-Rel Easter on Unsplash.

When you are dealing with LLMs, you often come across the terms “vectors,” “tokens” and “embeddings.” It’s important to thoroughly understand these concepts before delving into building chatbots and AI assistants. With multimodal approaches gaining ground, these terms go beyond just large language models (LLMs) to also interpret images and videos.

The objective of this tutorial is to introduce you to these core concepts through simple, straightforward examples and code snippets.

Vectors: The Language of Machines

Vectors play a crucial role in the functioning of LLMs and generative AI. To understand their significance, it’s essential to grasp what vectors are and how they are generated and utilized in LLMs.

In mathematics and physics, a vector is an object that has both magnitude and direction. It can be represented geometrically as a directed line segment, where the length of the line indicates the magnitude, and the arrow points in the direction of the vector. Vectors are fundamental in representing quantities that can’t be fully described by a single number — such as force, velocity or displacement — and which have both magnitude and direction.

In the realm of LLMs, vectors are used to represent text or data in a numerical form that the model can understand and process. This representation is known as an embedding. Embeddings are high-dimensional vectors that capture the semantic meaning of words, sentences or even entire documents. The process of converting text into embeddings allows LLMs to perform various natural language processing tasks, such as text generation, sentiment analysis and more.

Simply put, a vector is a single-dimensional array.

Since machines only understand numbers, data such as text and images is converted into vectors. The vector is the only format that is understood by neural networks and transformer architectures.

Operations on vectors, such as a dot product, help us discover whether two vectors are identical or different. At a high level, this forms the basis for performing similarity search on vectors stored in memory or in specialized vector databases.

The code snippet below introduces the basic idea of a vector. As you can see, it is a simple one-dimensional array:

import numpy as np

# Creating a vector from a list
vector = np.array([1, 2, 3])
print("Vector:", vector)

# Vector addition
vector2 = np.array([4, 5, 6])
sum_vector = vector + vector2
print("Vector addition:", sum_vector)

# Scalar multiplication
scalar = 2
scaled_vector = vector * scalar
print("Scalar multiplication:", scaled_vector)

While the vector shown above has no association with text, it does convey the idea. Tokens, which we explore in the next section, are the mechanism to represent text in vectors.

How Vectors Are Used by an LLM

When an LLM processes input, it works entirely with vectors at every stage. The input tokens are first converted to vectors, which then flow through multiple layers of the neural network. Each layer performs mathematical operations like matrix multiplication and dot products on these vectors to transform and refine the information. The attention mechanism, which is core to transformer architectures, calculates relationships between different parts of the input by comparing vectors representing different tokens. As vectors move through the network layers, they accumulate contextual information and semantic understanding. Finally, the output vectors are converted back to tokens and then to human-readable text. This vector-based processing allows LLMs to perform complex language understanding tasks by treating language as high-dimensional mathematical objects that can be manipulated using linear algebra operations.

Tokens: The Building Blocks of LLMs

Tokens are the basic units of data processed by LLMs. In the context of text, a token can be a word, part of a word (subword), or even a character — depending on the tokenization process.

When text is passed through a tokenizer, it encodes the input based on a specific scheme and emits specialized vectors that can be understood by the LLM. The encoding scheme is highly dependent on the LLM. The tokenizer may decide to convert each word and a part of the word into a vector, which is based on the encoding. When a token is passed through a decoder, it can be easily translated into text again.

It’s common to refer to the context length of LLMs as one of the key differentiating factors. Technically, it maps to the ability of the LLM to accept a specific number of tokens as input and generate another set of tokens as output. The tokenizer is responsible for encoding the prompt (input) into tokens and the response (output) back into text.

Tokens are the representations of text in the form of a vector.

The below code snippets explain how text is converted into tokens for an open model like Llama 2 and a commercial model such as GPT-4. These are based on the transformers module from Hugging Face and Tiktoken from OpenAI.

from transformers import AutoTokenizer

model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model,token="HF_TOKEN")

text = "Apple is a fruit"

token = tokenizer.encode(text)
print(token)

decoded_text = tokenizer.decode(token)
print(decoded_text)

👁 Image

import tiktoken

tokenizer=tiktoken.encoding_for_model("gpt-4")

text = "Apple is a fruit"

token=tokenizer.encode(text)
print(token)

decoded_text = tokenizer.decode(token)
print(decoded_text)

👁 Image

So, the key takeaway is that tokens are vectors based on a specific tokenizer.

How Do LLMs Use Tokens

During inference, LLMs process tokens sequentially to generate responses. When you input “The weather today is,” the tokenizer converts this to tokens like [1014, 9282, 3854, 374]. The LLM processes these input tokens through its neural network layers, building an understanding of the context and meaning. Based on this processing, it predicts the most likely next token – perhaps token 8369 representing “sunny.” This token is then fed back into the model along with the original input tokens, creating a new sequence [1014, 9282, 3854, 374, 8369]. The model repeats this process, predicting one token at a time until it generates a complete response. This autoregressive generation allows LLMs to produce coherent, contextually appropriate text by leveraging the patterns learned during training to predict the most probable continuation of the token sequence.

Embeddings: The Semantic Space

If tokens are vector representations of text, embeddings are tokens with semantic context. They represent the meaning and context of the text. If tokens are encoded or decoded by a tokenizer, an embeddings model is responsible for generating text embeddings in the form of a vector. Embeddings are what allow LLMs to understand the context, nuance and subtle meanings of words and phrases. They are the result of the model learning from vast amounts of text data, and encode not just the identity of a token but its relationships with other tokens.

Embeddings are the foundational aspect of LLMs.

Through embeddings, LLMs achieve a deep understanding of language, enabling tasks like sentiment analysis, text summarization and question answering with nuanced comprehension and generation capabilities. They are the entry point to the LLM, but they are also used outside of the LLM to convert text into vectors while retaining the semantic context. When text is passed through an embedding model, a vector is produced that contains the embeddings. Below are examples from an open source embedding model, sentence-transformers/all-MiniLM-L6-v2, as well as OpenAI’s model, text-embedding-3-small.

from sentence_transformers import SentenceTransformer

sentences = ["Apple is a fruit", "Car is a vehicle"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)

print(len(embeddings[0]))

print(embeddings)

👁 Image

from openai import OpenAI

client = OpenAI(api_key="OPENAI_API_KEY")

model="text-embedding-3-small"
sentences = ["Apple is a fruit", "Car is a vehicle"]

embeddings=client.embeddings.create(input = sentences, model=model).data[0].embedding

print(len(embeddings))

print(embeddings)

👁 Image

How Embeddings Are Critical for Implementing RAG

Retrieval-Augmented Generation (RAG) fundamentally depends on embeddings to find relevant information from knowledge bases. When implementing RAG, documents are first converted into embeddings using models like sentence-transformers and stored in vector databases. When a user asks “What are the side effects of medication X?”, the query is also converted to an embedding. The system then performs similarity search using dot product or cosine similarity to find document embeddings that are closest to the query embedding in the vector space. For example, documents about “medication X adverse reactions” would have embeddings close to the query embedding, while unrelated documents would be distant. The most similar documents are retrieved and provided as context to the LLM, which then generates an accurate answer grounded in the retrieved information. Without embeddings, RAG systems couldn’t efficiently match semantic meaning between queries and documents, making precise information retrieval impossible.

Comparison and Interaction

Tokens vs. Vectors: Tokens are the linguistic units, while vectors are the mathematical representations of these units. Every token is mapped to a vector in the LLM’s processing pipeline.

Vectors vs. Embeddings: All embeddings are vectors, but not all vectors are embeddings. Embeddings are vectors that have been specifically trained to capture deep semantic relationships.

Tokens and Embeddings: The transition from tokens to embeddings represents the movement from a discrete representation of language to a nuanced, continuous and contextually aware semantic space.

Understanding vectors, tokens and embeddings is fundamental to grasping how LLMs process language. Tokens serve as the basic data units, vectors provide a mathematical framework for machine processing, and embeddings bring depth and understanding, enabling LLMs to perform tasks with human-like versatility and accuracy. Together, these components form the backbone of LLM technology, enabling the sophisticated language models that power today’s AI applications.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Simply, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.