![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
When you are dealing with LLMs, you often come across the terms “vectors,” “tokens” and “embeddings.” It’s important to thoroughly understand these concepts before delving into building chatbots and AI assistants. With multimodal approaches gaining ground, these terms go beyond just large language models (LLMs) to also interpret images and videos.
The objective of this tutorial is to introduce you to these core concepts through simple, straightforward examples and code snippets.
Vectors play a crucial role in the functioning of LLMs and generative AI. To understand their significance, it’s essential to grasp what vectors are and how they are generated and utilized in LLMs.
In mathematics and physics, a vector is an object that has both magnitude and direction. It can be represented geometrically as a directed line segment, where the length of the line indicates the magnitude, and the arrow points in the direction of the vector. Vectors are fundamental in representing quantities that can’t be fully described by a single number — such as force, velocity or displacement — and which have both magnitude and direction.
In the realm of LLMs, vectors are used to represent text or data in a numerical form that the model can understand and process. This representation is known as an embedding. Embeddings are high-dimensional vectors that capture the semantic meaning of words, sentences or even entire documents. The process of converting text into embeddings allows LLMs to perform various natural language processing tasks, such as text generation, sentiment analysis and more.
Simply put, a vector is a single-dimensional array.
Since machines only understand numbers, data such as text and images is converted into vectors. The vector is the only format that is understood by neural networks and transformer architectures.
Operations on vectors, such as a dot product, help us discover whether two vectors are identical or different. At a high level, this forms the basis for performing similarity search on vectors stored in memory or in specialized vector databases.
The code snippet below introduces the basic idea of a vector. As you can see, it is a simple one-dimensional array:
import numpy as np
# Creating a vector from a list
vector = np.array([1, 2, 3])
print("Vector:", vector)
# Vector addition
vector2 = np.array([4, 5, 6])
sum_vector = vector + vector2
print("Vector addition:", sum_vector)
# Scalar multiplication
scalar = 2
scaled_vector = vector * scalar
print("Scalar multiplication:", scaled_vector)
While the vector shown above has no association with text, it does convey the idea. Tokens, which we explore in the next section, are the mechanism to represent text in vectors.
When an LLM processes input, it works entirely with vectors at every stage. The input tokens are first converted to vectors, which then flow through multiple layers of the neural network. Each layer performs mathematical operations like matrix multiplication and dot products on these vectors to transform and refine the information. The attention mechanism, which is core to transformer architectures, calculates relationships between different parts of the input by comparing vectors representing different tokens. As vectors move through the network layers, they accumulate contextual information and semantic understanding. Finally, the output vectors are converted back to tokens and then to human-readable text. This vector-based processing allows LLMs to perform complex language understanding tasks by treating language as high-dimensional mathematical objects that can be manipulated using linear algebra operations.
Tokens are the basic units of data processed by LLMs. In the context of text, a token can be a word, part of a word (subword), or even a character — depending on the tokenization process.
When text is passed through a tokenizer, it encodes the input based on a specific scheme and emits specialized vectors that can be understood by the LLM. The encoding scheme is highly dependent on the LLM. The tokenizer may decide to convert each word and a part of the word into a vector, which is based on the encoding. When a token is passed through a decoder, it can be easily translated into text again.
It’s common to refer to the context length of LLMs as one of the key differentiating factors. Technically, it maps to the ability of the LLM to accept a specific number of tokens as input and generate another set of tokens as output. The tokenizer is responsible for encoding the prompt (input) into tokens and the response (output) back into text.
Tokens are the representations of text in the form of a vector.
The below code snippets explain how text is converted into tokens for an open model like Llama 2 and a commercial model such as GPT-4. These are based on the transformers module from Hugging Face and Tiktoken from OpenAI.
from transformers import AutoTokenizer model = "meta-llama/Llama-2-7b-chat-hf" tokenizer = AutoTokenizer.from_pretrained(model,token="HF_TOKEN") text = "Apple is a fruit" token = tokenizer.encode(text) print(token) decoded_text = tokenizer.decode(token) print(decoded_text)
import tiktoken
tokenizer=tiktoken.encoding_for_model("gpt-4")
text = "Apple is a fruit"
token=tokenizer.encode(text)
print(token)
decoded_text = tokenizer.decode(token)
print(decoded_text)
So, the key takeaway is that tokens are vectors based on a specific tokenizer.
During inference, LLMs process tokens sequentially to generate responses. When you input “The weather today is,” the tokenizer converts this to tokens like [1014, 9282, 3854, 374]. The LLM processes these input tokens through its neural network layers, building an understanding of the context and meaning. Based on this processing, it predicts the most likely next token – perhaps token 8369 representing “sunny.” This token is then fed back into the model along with the original input tokens, creating a new sequence [1014, 9282, 3854, 374, 8369]. The model repeats this process, predicting one token at a time until it generates a complete response. This autoregressive generation allows LLMs to produce coherent, contextually appropriate text by leveraging the patterns learned during training to predict the most probable continuation of the token sequence.
If tokens are vector representations of text, embeddings are tokens with semantic context. They represent the meaning and context of the text. If tokens are encoded or decoded by a tokenizer, an embeddings model is responsible for generating text embeddings in the form of a vector. Embeddings are what allow LLMs to understand the context, nuance and subtle meanings of words and phrases. They are the result of the model learning from vast amounts of text data, and encode not just the identity of a token but its relationships with other tokens.
Embeddings are the foundational aspect of LLMs.
Through embeddings, LLMs achieve a deep understanding of language, enabling tasks like sentiment analysis, text summarization and question answering with nuanced comprehension and generation capabilities. They are the entry point to the LLM, but they are also used outside of the LLM to convert text into vectors while retaining the semantic context. When text is passed through an embedding model, a vector is produced that contains the embeddings. Below are examples from an open source embedding model, sentence-transformers/all-MiniLM-L6-v2, as well as OpenAI’s model, text-embedding-3-small.
from sentence_transformers import SentenceTransformer
sentences = ["Apple is a fruit", "Car is a vehicle"]
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(len(embeddings[0]))
print(embeddings)
from openai import OpenAI client = OpenAI(api_key="OPENAI_API_KEY") model="text-embedding-3-small" sentences = ["Apple is a fruit", "Car is a vehicle"] embeddings=client.embeddings.create(input = sentences, model=model).data[0].embedding print(len(embeddings)) print(embeddings)
Retrieval-Augmented Generation (RAG) fundamentally depends on embeddings to find relevant information from knowledge bases. When implementing RAG, documents are first converted into embeddings using models like sentence-transformers and stored in vector databases. When a user asks “What are the side effects of medication X?”, the query is also converted to an embedding. The system then performs similarity search using dot product or cosine similarity to find document embeddings that are closest to the query embedding in the vector space. For example, documents about “medication X adverse reactions” would have embeddings close to the query embedding, while unrelated documents would be distant. The most similar documents are retrieved and provided as context to the LLM, which then generates an accurate answer grounded in the retrieved information. Without embeddings, RAG systems couldn’t efficiently match semantic meaning between queries and documents, making precise information retrieval impossible.
Tokens vs. Vectors: Tokens are the linguistic units, while vectors are the mathematical representations of these units. Every token is mapped to a vector in the LLM’s processing pipeline.
Vectors vs. Embeddings: All embeddings are vectors, but not all vectors are embeddings. Embeddings are vectors that have been specifically trained to capture deep semantic relationships.
Tokens and Embeddings: The transition from tokens to embeddings represents the movement from a discrete representation of language to a nuanced, continuous and contextually aware semantic space.
Understanding vectors, tokens and embeddings is fundamental to grasping how LLMs process language. Tokens serve as the basic data units, vectors provide a mathematical framework for machine processing, and embeddings bring depth and understanding, enabling LLMs to perform tasks with human-like versatility and accuracy. Together, these components form the backbone of LLM technology, enabling the sophisticated language models that power today’s AI applications.