![]() |
VOOZH | about |
This page introduces advanced LLamaSharp capabilities that extend beyond basic text generation. These features enable specialized use cases including semantic search through embeddings, vision-language models, concurrent conversation management, and model adaptation through LoRA fine-tuning or reranking.
Scope of this section:
For detailed information:
LLamaSharp provides five major categories of advanced functionality, each addressing distinct use cases beyond single-turn text generation.
| Feature Category | Primary Class | Key Use Cases | Resource Requirements |
|---|---|---|---|
| Text Embeddings | LLamaEmbedder | Semantic search, clustering, similarity comparison | Encoder or decoder model, minimal KV cache |
| Multimodal | MtmdWeights | Vision-language tasks, image captioning, visual QA | Separate multimodal projector file, larger context |
| Batched Execution | BatchedExecutor | Multi-user chat, concurrent conversations | Shared model weights, per-conversation KV cache |
| LoRA Adapters | Model parameter APIs | Task-specific fine-tuning, model customization | Additional adapter weights, base model required |
| Reranking | LLamaReranker | Document ranking, search result refinement | Specialized reranker model (e.g., Jina), Rank pooling |
The following diagram shows how advanced features integrate with the core LLamaSharp architecture and extend the base capabilities.
Architecture: Advanced Features Integration Points
Sources: LLama/LLamaEmbedder.cs15-51 LLama/MtmdWeights.cs12-40 LLama.Unittest/LLamaEmbedderTests.cs26-34
Different advanced features are appropriate for different scenarios. The following diagram maps common requirements to the appropriate advanced feature.
Decision Flow: Selecting Advanced Features
Sources: LLama/LLamaEmbedder.cs128-142 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama/MtmdWeights.cs33-40
The LLamaEmbedder class generates high-dimensional vector representations of text for semantic similarity tasks LLama/LLamaEmbedder.cs15-17
| Component | Description | Code Reference |
|---|---|---|
LLamaEmbedder | Main embedder class | LLama/LLamaEmbedder.cs15-51 |
GetEmbeddings() | Generate embeddings from text | LLama/LLamaEmbedder.cs69-70 |
EmbeddingSize | Dimension of output vectors | LLama/LLamaEmbedder.cs21 |
LLamaPoolingType | Controls output granularity | LLama/LLamaEmbedder.cs128-129 |
EuclideanNormalization() | Extension to normalize embedding vectors | LLama/LLamaEmbedder.cs147 |
The PoolingType parameter in IContextParams determines how embeddings are aggregated:
LLamaPoolingType.Mean: Returns a single embedding vector representing the entire input string (most common for semantic search) LLama.Unittest/LLamaEmbedderTests.cs31LLamaPoolingType.None: Returns one embedding vector per token (useful for token-level analysis) LLama.Unittest/LLamaEmbedderTests.cs102LLamaPoolingType.Rank: Specifically used for reranking tasks where relevance scores are computed between queries and documents.Sources: LLama/LLamaEmbedder.cs38-51 LLama.Unittest/LLamaEmbedderTests.cs26-34 LLama.Unittest/LLamaEmbedderTests.cs97-103
The MtmdWeights class enables processing of images and audio alongside text, supporting vision-language models and multimodal inference LLama/MtmdWeights.cs12-14
Multimodal support requires loading a projection model alongside the base LLM. The MtmdWeights.LoadFromFileAsync or LoadFromFile method handles this initialization LLama/MtmdWeights.cs33-40 LLama/Native/SafeMtmdModelHandle.cs35-66
Media Marker System:
Multimodal inputs use special markers (defined in MtmdContextParams) in the prompt to reference media files LLama.Unittest/MtmdWeightsTests.cs27-31 The MtmdWeights class provides methods to LoadMedia from disk or memory buffers LLama/MtmdWeights.cs82-87 and Tokenize text against these pending media buffers LLama/MtmdWeights.cs97-105
Native Integration:
The system uses native handles such as SafeMtmdModelHandle LLama/Native/SafeMtmdModelHandle.cs13-14 and SafeMtmdEmbed to manage multimodal data. It supports checking for specific capabilities like vision LLama/Native/NativeApi.Mtmd.cs68-69 or audio support LLama/Native/NativeApi.Mtmd.cs77-78
Sources: LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdModelHandle.cs35-66 LLama.Unittest/MtmdWeightsTests.cs17-35 LLama/Native/NativeApi.Mtmd.cs50-88
The BatchedExecutor manages multiple concurrent conversations sharing the same model weights, enabling efficient multi-user scenarios LLama/Batched/BatchedExecutor.cs14-15
BatchedExecutor Conversation Management
LLamaWeights instance shared across all conversations LLama/Batched/BatchedExecutor.cs92Conversation maintains its own sequence state and KV cache segments using a unique LLamaSeqId LLama/Batched/Conversation.cs45-50Epoch system coordinates when conversations can be sampled and when the model evaluates a new batch LLama/Batched/BatchedExecutor.cs82Sources: LLama/Batched/BatchedExecutor.cs14-153 LLama/Batched/Conversation.cs14-187 LLama.Examples/Examples/BatchedExecutorSaveAndLoad.cs12-85
For detailed usage examples and API references for each advanced feature, refer to the respective subsections: Text Embeddings, Multimodal Support, Batched Execution, LoRA Adapters, and Reranking.
Refresh this wiki