Last indexed: 18 May 2026 (ecd184)

Advanced Features

This page introduces advanced LLamaSharp capabilities that extend beyond basic text generation. These features enable specialized use cases including semantic search through embeddings, vision-language models, concurrent conversation management, and model adaptation through LoRA fine-tuning or reranking.

Scope of this section:

Overview of advanced feature architecture and integration points.
High-level comparison of advanced capabilities.
Common patterns for using advanced features.

For detailed information:

Text embedding generation and vector operations: see Text Embeddings
Image and audio processing with multimodal models: see Multimodal Support
Managing multiple concurrent conversations: see Batched Execution
Model fine-tuning with LoRA adapters: see LoRA Adapters
Document relevance scoring and reranking: see Reranking

Advanced Feature Categories

LLamaSharp provides five major categories of advanced functionality, each addressing distinct use cases beyond single-turn text generation.

Feature Category	Primary Class	Key Use Cases	Resource Requirements
Text Embeddings	`LLamaEmbedder`	Semantic search, clustering, similarity comparison	Encoder or decoder model, minimal KV cache
Multimodal	`MtmdWeights`	Vision-language tasks, image captioning, visual QA	Separate multimodal projector file, larger context
Batched Execution	`BatchedExecutor`	Multi-user chat, concurrent conversations	Shared model weights, per-conversation KV cache
LoRA Adapters	Model parameter APIs	Task-specific fine-tuning, model customization	Additional adapter weights, base model required
Reranking	`LLamaReranker`	Document ranking, search result refinement	Specialized reranker model (e.g., Jina), Rank pooling

Advanced Features Architecture

The following diagram shows how advanced features integrate with the core LLamaSharp architecture and extend the base capabilities.

Architecture: Advanced Features Integration Points

Sources: LLama/LLamaEmbedder.cs15-51 LLama/MtmdWeights.cs12-40 LLama.Unittest/LLamaEmbedderTests.cs26-34

Feature Selection Guide

Different advanced features are appropriate for different scenarios. The following diagram maps common requirements to the appropriate advanced feature.

Decision Flow: Selecting Advanced Features

Sources: LLama/LLamaEmbedder.cs128-142 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama/MtmdWeights.cs33-40

Text Embeddings Overview

The LLamaEmbedder class generates high-dimensional vector representations of text for semantic similarity tasks LLama/LLamaEmbedder.cs15-17

Key Components

Component	Description	Code Reference
`LLamaEmbedder`	Main embedder class	LLama/LLamaEmbedder.cs15-51
`GetEmbeddings()`	Generate embeddings from text	LLama/LLamaEmbedder.cs69-70
`EmbeddingSize`	Dimension of output vectors	LLama/LLamaEmbedder.cs21
`LLamaPoolingType`	Controls output granularity	LLama/LLamaEmbedder.cs128-129
`EuclideanNormalization()`	Extension to normalize embedding vectors	LLama/LLamaEmbedder.cs147

Pooling Types

The PoolingType parameter in IContextParams determines how embeddings are aggregated:

LLamaPoolingType.Mean: Returns a single embedding vector representing the entire input string (most common for semantic search) LLama.Unittest/LLamaEmbedderTests.cs31
LLamaPoolingType.None: Returns one embedding vector per token (useful for token-level analysis) LLama.Unittest/LLamaEmbedderTests.cs102
LLamaPoolingType.Rank: Specifically used for reranking tasks where relevance scores are computed between queries and documents.

Sources: LLama/LLamaEmbedder.cs38-51 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama.Unittest/LLamaEmbedderTests.cs97-103

Multimodal Support Overview

The MtmdWeights class enables processing of images and audio alongside text, supporting vision-language models and multimodal inference LLama/MtmdWeights.cs12-14

Multimodal Components

Multimodal support requires loading a projection model alongside the base LLM. The MtmdWeights.LoadFromFileAsync or LoadFromFile method handles this initialization LLama/MtmdWeights.cs33-40 LLama/Native/SafeMtmdModelHandle.cs35-66

Media Marker System: Multimodal inputs use special markers (defined in MtmdContextParams) in the prompt to reference media files LLama.Unittest/MtmdWeightsTests.cs27-31 The MtmdWeights class provides methods to LoadMedia from disk or memory buffers LLama/MtmdWeights.cs82-87 and Tokenize text against these pending media buffers LLama/MtmdWeights.cs97-105

Native Integration: The system uses native handles such as SafeMtmdModelHandle LLama/Native/SafeMtmdModelHandle.cs13-14 and SafeMtmdEmbed to manage multimodal data. It supports checking for specific capabilities like vision LLama/Native/NativeApi.Mtmd.cs68-69 or audio support LLama/Native/NativeApi.Mtmd.cs77-78

Sources: LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdModelHandle.cs35-66 LLama.Unittest/MtmdWeightsTests.cs17-35 LLama/Native/NativeApi.Mtmd.cs50-88

Batched Execution Overview

The BatchedExecutor manages multiple concurrent conversations sharing the same model weights, enabling efficient multi-user scenarios LLama/Batched/BatchedExecutor.cs14-15

Batched Execution Architecture

BatchedExecutor Conversation Management

Key Characteristics

Resource Sharing: Single LLamaWeights instance shared across all conversations LLama/Batched/BatchedExecutor.cs92
Isolation: Each Conversation maintains its own sequence state and KV cache segments using a unique LLamaSeqId LLama/Batched/Conversation.cs45-50
Synchronization: An Epoch system coordinates when conversations can be sampled and when the model evaluates a new batch LLama/Batched/BatchedExecutor.cs82
Forking: Conversations can be forked to share KV cache state, enabling efficient branching scenarios where the copy shares internal state LLama/Batched/Conversation.cs160-187
State Persistence: Conversations can be saved to disk or memory and reloaded later LLama.Examples/Examples/BatchedExecutorSaveAndLoad.cs46-72

Sources: LLama/Batched/BatchedExecutor.cs14-153 LLama/Batched/Conversation.cs14-187 LLama.Examples/Examples/BatchedExecutorSaveAndLoad.cs12-85

For detailed usage examples and API references for each advanced feature, refer to the respective subsections: Text Embeddings, Multimodal Support, Batched Execution, LoRA Adapters, and Reranking.

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/5-advanced-features

⇱ Advanced Features | SciSharp/LLamaSharp | DeepWiki