Last indexed: 18 May 2026 (ecd184)

Text Embeddings

This document covers the LLamaEmbedder class and the text embedding generation system in LLamaSharp. Text embeddings are high-dimensional vector representations of text that capture semantic meaning and enable similarity comparisons, clustering, and retrieval-augmented generation (RAG) workflows.

For information about using embeddings in RAG systems with Kernel Memory, see 7.2. Kernel Memory Integration For multimodal embeddings that include image context, see 5.2. Multimodal Support

Overview

Text embeddings convert natural language text into fixed-size numerical vectors (typically 384-1024 dimensions) where semantically similar texts have similar vector representations. LLamaSharp provides the LLamaEmbedder class to generate embeddings using GGUF models that support embedding generation.

Common use cases:

Semantic search and document retrieval
Text similarity and clustering
Building vector databases for RAG
Content recommendation systems

Sources: LLama/LLamaEmbedder.cs12-21

LLamaEmbedder Architecture

The LLamaEmbedder class provides a managed interface for generating embeddings from GGUF models. Unlike text generation executors, the embedder creates and disposes contexts per request for memory efficiency LLama/LLamaEmbedder.cs72-78 LLama/LLamaEmbedder.cs150 It implements the IEmbeddingGenerator<string, Embedding<float>> interface from the Microsoft.Extensions.AI namespace LLama/LLamaEmbedder.EmbeddingGenerator.cs11-13

Code Entity Relationship Diagram

Sources: LLama/LLamaEmbedder.cs15-60 LLama/LLamaEmbedder.EmbeddingGenerator.cs11-13 LLama/LLamaEmbedder.cs72-153

Basic Usage

Creating an Embedder

The LLamaEmbedder requires LLamaWeights and IContextParams LLama/LLamaEmbedder.cs38 Key parameters include PoolingType which determines the output format.

Example from tests:

Important constraints:

UBatchSize must equal BatchSize for non-causal models LLama/LLamaEmbedder.cs40-41
Encoder-decoder models are not supported for embeddings LLama/LLamaEmbedder.cs42-43

Sources: LLama/LLamaEmbedder.cs38-51 LLama.Unittest/LLamaEmbedderTests.cs26-34

Generating Embeddings

The GetEmbeddings() method tokenizes the input text, processes it through the model, and returns normalized embedding vectors LLama/LLamaEmbedder.cs69-72 Internally, it enables embedding mode on the context using Context.NativeHandle.SetEmbeddings(true) LLama/LLamaEmbedder.cs79

Data Flow Diagram

Sources: LLama/LLamaEmbedder.cs72-153

Pooling Types

The PoolingType parameter in IContextParams controls how token-level embeddings are aggregated into final output vectors LLama/LLamaEmbedder.cs128-129

Pooling Type	Output Count	Description	Use Case
`LLamaPoolingType.Mean`	1 vector per input	Average all token embeddings	Semantic search, similarity
`LLamaPoolingType.Rank`	1 vector per input	Specific pooling for reranking models	Search relevance scoring
`LLamaPoolingType.None`	1 vector per token	Individual token embeddings	Token-level analysis, attention

Pooled embeddings (Mean):

Non-pooled embeddings (None):

Note: The Microsoft.Extensions.AI implementation GenerateAsync does not support LLamaPoolingType.None LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49

Sources: LLama/LLamaEmbedder.cs128-141 LLama.Unittest/LLamaEmbedderTests.cs101-103 LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49

Model Compatibility

LLamaEmbedder supports both dedicated embedding models and standard generative models.

Encoder vs Decoder Models

Sources: LLama/LLamaEmbedder.cs104-124 LLama.Unittest/LLamaEmbedderTests.cs84-93

Embedding Normalization

All embedding vectors returned by GetEmbeddings() are automatically normalized using EuclideanNormalization() LLama/LLamaEmbedder.cs147 This ensures vectors have unit length (L2 norm), which is standard for cosine similarity calculations.

The normalization logic is provided by SpanNormalizationExtensions using TensorPrimitives for performance LLama/Extensions/SpanNormalizationExtensions.cs77-82

Available Normalizations:

EuclideanNormalization(): L2 normalization LLama/Extensions/SpanNormalizationExtensions.cs65-82
TaxicabNormalization(): L1/Manhattan normalization LLama/Extensions/SpanNormalizationExtensions.cs40-57
MaxAbsoluteNormalization(): Scales by max magnitude LLama/Extensions/SpanNormalizationExtensions.cs16-32
PNormalization(int p): Generalized p-norm LLama/Extensions/SpanNormalizationExtensions.cs107-137

Sources: LLama/LLamaEmbedder.cs143-148 LLama/Extensions/SpanNormalizationExtensions.cs9-138

Performance and Resource Management

Context Lifecycle

LLamaEmbedder creates and disposes a fresh LLamaContext for every call to GetEmbeddings() or GenerateAsync() LLama/LLamaEmbedder.cs78 LLama/LLamaEmbedder.cs150 This minimizes the long-term memory footprint on the GPU compared to keeping a context active.

Batch Processing

Input text is processed in chunks according to BatchSize defined in IContextParams LLama/LLamaEmbedder.cs92-93

Microsoft.Extensions.AI Support

The class implements IEmbeddingGenerator<string, Embedding<float>> LLama/LLamaEmbedder.EmbeddingGenerator.cs12 The GetService method provides access to the underlying LLamaContext and LLamaEmbedder instance LLama/LLamaEmbedder.EmbeddingGenerator.cs17-41

Sources: LLama/LLamaEmbedder.cs72-78 LLama/LLamaEmbedder.cs150 LLama/LLamaEmbedder.EmbeddingGenerator.cs17-41

Error Handling

Exception Type	Condition	Location
`ArgumentException`	Input text exceeds `ContextSize`	LLama/LLamaEmbedder.cs83-84
`ArgumentException`	`UBatchSize` ≠ `BatchSize`	LLama/LLamaEmbedder.cs40-41
`NotSupportedException`	Encoder-decoder model used	LLama/LLamaEmbedder.cs42-43
`NotSupportedException`	`GenerateAsync` called with `PoolingType.None`	LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49
`RuntimeError`	Native Encode/Decode operation failed	LLama/LLamaEmbedder.cs110 LLama/LLamaEmbedder.cs118

Sources: LLama/LLamaEmbedder.cs40-43 LLama/LLamaEmbedder.cs83-84 LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/5.1-text-embeddings

⇱ Text Embeddings | SciSharp/LLamaSharp | DeepWiki