VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/5.1-text-embeddings

⇱ Text Embeddings | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Text Embeddings

This document covers the LLamaEmbedder class and the text embedding generation system in LLamaSharp. Text embeddings are high-dimensional vector representations of text that capture semantic meaning and enable similarity comparisons, clustering, and retrieval-augmented generation (RAG) workflows.

For information about using embeddings in RAG systems with Kernel Memory, see 7.2. Kernel Memory Integration For multimodal embeddings that include image context, see 5.2. Multimodal Support


Overview

Text embeddings convert natural language text into fixed-size numerical vectors (typically 384-1024 dimensions) where semantically similar texts have similar vector representations. LLamaSharp provides the LLamaEmbedder class to generate embeddings using GGUF models that support embedding generation.

Common use cases:

  • Semantic search and document retrieval
  • Text similarity and clustering
  • Building vector databases for RAG
  • Content recommendation systems

Sources: LLama/LLamaEmbedder.cs12-21


LLamaEmbedder Architecture

The LLamaEmbedder class provides a managed interface for generating embeddings from GGUF models. Unlike text generation executors, the embedder creates and disposes contexts per request for memory efficiency LLama/LLamaEmbedder.cs72-78 LLama/LLamaEmbedder.cs150 It implements the IEmbeddingGenerator<string, Embedding<float>> interface from the Microsoft.Extensions.AI namespace LLama/LLamaEmbedder.EmbeddingGenerator.cs11-13

Code Entity Relationship Diagram


Sources: LLama/LLamaEmbedder.cs15-60 LLama/LLamaEmbedder.EmbeddingGenerator.cs11-13 LLama/LLamaEmbedder.cs72-153


Basic Usage

Creating an Embedder

The LLamaEmbedder requires LLamaWeights and IContextParams LLama/LLamaEmbedder.cs38 Key parameters include PoolingType which determines the output format.


Example from tests:


Important constraints:

Sources: LLama/LLamaEmbedder.cs38-51 LLama.Unittest/LLamaEmbedderTests.cs26-34


Generating Embeddings

The GetEmbeddings() method tokenizes the input text, processes it through the model, and returns normalized embedding vectors LLama/LLamaEmbedder.cs69-72 Internally, it enables embedding mode on the context using Context.NativeHandle.SetEmbeddings(true) LLama/LLamaEmbedder.cs79

Data Flow Diagram


Sources: LLama/LLamaEmbedder.cs72-153


Pooling Types

The PoolingType parameter in IContextParams controls how token-level embeddings are aggregated into final output vectors LLama/LLamaEmbedder.cs128-129

Pooling TypeOutput CountDescriptionUse Case
LLamaPoolingType.Mean1 vector per inputAverage all token embeddingsSemantic search, similarity
LLamaPoolingType.Rank1 vector per inputSpecific pooling for reranking modelsSearch relevance scoring
LLamaPoolingType.None1 vector per tokenIndividual token embeddingsToken-level analysis, attention

Pooled embeddings (Mean):


Non-pooled embeddings (None):


Note: The Microsoft.Extensions.AI implementation GenerateAsync does not support LLamaPoolingType.None LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49

Sources: LLama/LLamaEmbedder.cs128-141 LLama.Unittest/LLamaEmbedderTests.cs101-103 LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49


Model Compatibility

LLamaEmbedder supports both dedicated embedding models and standard generative models.

Encoder vs Decoder Models


Sources: LLama/LLamaEmbedder.cs104-124 LLama.Unittest/LLamaEmbedderTests.cs84-93


Embedding Normalization

All embedding vectors returned by GetEmbeddings() are automatically normalized using EuclideanNormalization() LLama/LLamaEmbedder.cs147 This ensures vectors have unit length (L2 norm), which is standard for cosine similarity calculations.

The normalization logic is provided by SpanNormalizationExtensions using TensorPrimitives for performance LLama/Extensions/SpanNormalizationExtensions.cs77-82

Available Normalizations:

Sources: LLama/LLamaEmbedder.cs143-148 LLama/Extensions/SpanNormalizationExtensions.cs9-138


Performance and Resource Management

Context Lifecycle

LLamaEmbedder creates and disposes a fresh LLamaContext for every call to GetEmbeddings() or GenerateAsync() LLama/LLamaEmbedder.cs78 LLama/LLamaEmbedder.cs150 This minimizes the long-term memory footprint on the GPU compared to keeping a context active.

Batch Processing

Input text is processed in chunks according to BatchSize defined in IContextParams LLama/LLamaEmbedder.cs92-93

Microsoft.Extensions.AI Support

The class implements IEmbeddingGenerator<string, Embedding<float>> LLama/LLamaEmbedder.EmbeddingGenerator.cs12 The GetService method provides access to the underlying LLamaContext and LLamaEmbedder instance LLama/LLamaEmbedder.EmbeddingGenerator.cs17-41

Sources: LLama/LLamaEmbedder.cs72-78 LLama/LLamaEmbedder.cs150 LLama/LLamaEmbedder.EmbeddingGenerator.cs17-41


Error Handling

Exception TypeConditionLocation
ArgumentExceptionInput text exceeds ContextSizeLLama/LLamaEmbedder.cs83-84
ArgumentExceptionUBatchSizeBatchSizeLLama/LLamaEmbedder.cs40-41
NotSupportedExceptionEncoder-decoder model usedLLama/LLamaEmbedder.cs42-43
NotSupportedExceptionGenerateAsync called with PoolingType.NoneLLama/LLamaEmbedder.EmbeddingGenerator.cs46-49
RuntimeErrorNative Encode/Decode operation failedLLama/LLamaEmbedder.cs110 LLama/LLamaEmbedder.cs118

Sources: LLama/LLamaEmbedder.cs40-43 LLama/LLamaEmbedder.cs83-84 LLama/LLamaEmbedder.EmbeddingGenerator.cs46-49