VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/5-advanced-features

⇱ Advanced Features | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Advanced Features

This page introduces advanced LLamaSharp capabilities that extend beyond basic text generation. These features enable specialized use cases including semantic search through embeddings, vision-language models, concurrent conversation management, and model adaptation through LoRA fine-tuning or reranking.

Scope of this section:

  • Overview of advanced feature architecture and integration points.
  • High-level comparison of advanced capabilities.
  • Common patterns for using advanced features.

For detailed information:


Advanced Feature Categories

LLamaSharp provides five major categories of advanced functionality, each addressing distinct use cases beyond single-turn text generation.

Feature CategoryPrimary ClassKey Use CasesResource Requirements
Text EmbeddingsLLamaEmbedderSemantic search, clustering, similarity comparisonEncoder or decoder model, minimal KV cache
MultimodalMtmdWeightsVision-language tasks, image captioning, visual QASeparate multimodal projector file, larger context
Batched ExecutionBatchedExecutorMulti-user chat, concurrent conversationsShared model weights, per-conversation KV cache
LoRA AdaptersModel parameter APIsTask-specific fine-tuning, model customizationAdditional adapter weights, base model required
RerankingLLamaRerankerDocument ranking, search result refinementSpecialized reranker model (e.g., Jina), Rank pooling

Advanced Features Architecture

The following diagram shows how advanced features integrate with the core LLamaSharp architecture and extend the base capabilities.

Architecture: Advanced Features Integration Points


Sources: LLama/LLamaEmbedder.cs15-51 LLama/MtmdWeights.cs12-40 LLama.Unittest/LLamaEmbedderTests.cs26-34


Feature Selection Guide

Different advanced features are appropriate for different scenarios. The following diagram maps common requirements to the appropriate advanced feature.

Decision Flow: Selecting Advanced Features


Sources: LLama/LLamaEmbedder.cs128-142 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama/MtmdWeights.cs33-40


Text Embeddings Overview

The LLamaEmbedder class generates high-dimensional vector representations of text for semantic similarity tasks LLama/LLamaEmbedder.cs15-17

Key Components

ComponentDescriptionCode Reference
LLamaEmbedderMain embedder classLLama/LLamaEmbedder.cs15-51
GetEmbeddings()Generate embeddings from textLLama/LLamaEmbedder.cs69-70
EmbeddingSizeDimension of output vectorsLLama/LLamaEmbedder.cs21
LLamaPoolingTypeControls output granularityLLama/LLamaEmbedder.cs128-129
EuclideanNormalization()Extension to normalize embedding vectorsLLama/LLamaEmbedder.cs147

Pooling Types

The PoolingType parameter in IContextParams determines how embeddings are aggregated:

  • LLamaPoolingType.Mean: Returns a single embedding vector representing the entire input string (most common for semantic search) LLama.Unittest/LLamaEmbedderTests.cs31
  • LLamaPoolingType.None: Returns one embedding vector per token (useful for token-level analysis) LLama.Unittest/LLamaEmbedderTests.cs102
  • LLamaPoolingType.Rank: Specifically used for reranking tasks where relevance scores are computed between queries and documents.

Sources: LLama/LLamaEmbedder.cs38-51 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama.Unittest/LLamaEmbedderTests.cs97-103


Multimodal Support Overview

The MtmdWeights class enables processing of images and audio alongside text, supporting vision-language models and multimodal inference LLama/MtmdWeights.cs12-14

Multimodal Components

Multimodal support requires loading a projection model alongside the base LLM. The MtmdWeights.LoadFromFileAsync or LoadFromFile method handles this initialization LLama/MtmdWeights.cs33-40 LLama/Native/SafeMtmdModelHandle.cs35-66

Media Marker System: Multimodal inputs use special markers (defined in MtmdContextParams) in the prompt to reference media files LLama.Unittest/MtmdWeightsTests.cs27-31 The MtmdWeights class provides methods to LoadMedia from disk or memory buffers LLama/MtmdWeights.cs82-87 and Tokenize text against these pending media buffers LLama/MtmdWeights.cs97-105

Native Integration: The system uses native handles such as SafeMtmdModelHandle LLama/Native/SafeMtmdModelHandle.cs13-14 and SafeMtmdEmbed to manage multimodal data. It supports checking for specific capabilities like vision LLama/Native/NativeApi.Mtmd.cs68-69 or audio support LLama/Native/NativeApi.Mtmd.cs77-78

Sources: LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdModelHandle.cs35-66 LLama.Unittest/MtmdWeightsTests.cs17-35 LLama/Native/NativeApi.Mtmd.cs50-88


Batched Execution Overview

The BatchedExecutor manages multiple concurrent conversations sharing the same model weights, enabling efficient multi-user scenarios LLama/Batched/BatchedExecutor.cs14-15

Batched Execution Architecture

BatchedExecutor Conversation Management


Key Characteristics

Sources: LLama/Batched/BatchedExecutor.cs14-153 LLama/Batched/Conversation.cs14-187 LLama.Examples/Examples/BatchedExecutorSaveAndLoad.cs12-85


For detailed usage examples and API references for each advanced feature, refer to the respective subsections: Text Embeddings, Multimodal Support, Batched Execution, LoRA Adapters, and Reranking.