![]() |
VOOZH | about |
This document describes the core API classes that provide the foundational layer for model operations in LLamaSharp. These classes directly wrap native llama.cpp functionality and provide the building blocks used by higher-level abstractions like executors and chat sessions.
The core API consists of primary managed classes and their configuration interfaces:
LLamaWeights: Manages loaded model weights and vocabulary.LLamaContext: Provides inference capabilities and context management.ModelParams: Configuration for model loading.IContextParams: Interface for context creation parameters.LLamaEmbedder: High-level utility for generating text embeddings.LLamaReranker: Utility for computing relevance scores between queries and documents.LLamaQuantizer: Static utility for model quantization.These classes wrap native handles (SafeLlamaModelHandle, SafeLLamaContextHandle) and provide managed, type-safe access to llama.cpp functionality.
Core API Class Relationships
Sources: LLama/LLamaWeights.cs17-18 LLama/LLamaContext.cs18-42 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/LLamaEmbedder.cs15-26 LLama/LLamaReranker.cs15-26
LLamaWeights represents a set of model weights loaded into memory. It is the prerequisite for creating an inference context.
| Property | Type | Description |
|---|---|---|
NativeHandle | SafeLlamaModelHandle | The underlying native handle LLama/LLamaWeights.cs24 |
ContextSize | int | Number of tokens the model was trained for LLama/LLamaWeights.cs29 |
SizeInBytes | ulong | Size of the model in bytes LLama/LLamaWeights.cs34 |
ParameterCount | ulong | Total number of parameters LLama/LLamaWeights.cs39 |
EmbeddingSize | int | Dimension of embedding vectors LLama/LLamaWeights.cs44 |
Vocab | SafeLlamaModelHandle.Vocabulary | Vocabulary and special tokens LLama/LLamaWeights.cs49 |
Metadata | IReadOnlyDictionary<string, string> | Model metadata key-value pairs LLama/LLamaWeights.cs54 |
Weights are loaded using IModelParams. The asynchronous version supports progress reporting and cancellation via a native callback wrapper LLama/LLamaWeights.cs95-111
A weight instance can produce multiple contexts, each with its own KV cache and state.
Sources: LLama/LLamaWeights.cs17-170 LLama/Native/SafeLlamaModelHandle.cs136-151
LLamaContext holds the state required for interaction with a model. It wraps SafeLLamaContextHandle and provides methods for tokenization, decoding, and state management.
llama_tokenize via the native handle LLama/LLamaContext.cs107-110Context state can be saved to or loaded from files. The implementation uses MemoryMappedFile to write bytes directly from the native pointer to disk LLama/LLamaContext.cs144-160
Sources: LLama/LLamaContext.cs18-126 LLama/Native/SafeLLamaContextHandle.cs13-68
These classes provide specialized pipelines for non-generative tasks.
Used for generating high-dimensional vectors. It automatically configures the context for embedding mode using llama_set_embeddings LLama/LLamaEmbedder.cs79
NotSupportedException for encoder-decoder models LLama/LLamaEmbedder.cs42-43Computes relevance scores between a query and multiple documents.
PoolingType to be set to LLamaPoolingType.Rank LLama/LLamaReranker.cs40-41Sources: LLama/LLamaEmbedder.cs15-154 LLama/LLamaReranker.cs15-181
LLamaQuantizer provides a static interface to the native llama_model_quantize API LLama/LLamaQuantizer.cs10-11
It supports a "relaxed" string-to-enum parsing for LLamaFtype, allowing partial matches like "Q5_K_M" LLama/LLamaQuantizer.cs126-151
Sources: LLama/LLamaQuantizer.cs10-153 LLama/Native/NativeApi.Quantize.cs10-15
| Type | Implementation | Description |
|---|---|---|
LLamaToken | struct | Integer ID representing a piece of text LLama/Native/NativeApi.cs109 |
LLamaSeqId | struct | Identifier for a sequence within the KV cache LLama/Native/LLamaSeqId.cs10 |
LLamaBatch | class | Managed structure for submitting multiple tokens for evaluation LLama/Native/LLamaNativeBatch.cs10 |
Data Flow: Text to Inference
Sources: LLama/LLamaContext.cs107-110 LLama/Native/SafeLLamaContextHandle.cs180-185 LLama/Native/LLamaSeqId.cs10
LLamaSharp uses a hierarchy of SafeHandle types to prevent memory leaks and use-after-free errors.
SafeLLamaHandleBase: Base class for all native handles LLama/Native/SafeLlamaModelHandle.cs16SafeLLamaContextHandle increments the reference count of its parent SafeLlamaModelHandle during creation LLama/Native/SafeLLamaContextHandle.cs119 and decrements it upon disposal LLama/Native/SafeLLamaContextHandle.cs86NativeApi.llama_empty_call() is executed to force library loading before handle creation LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130Sources: LLama/Native/SafeLlamaModelHandle.cs15-127 LLama/Native/SafeLLamaContextHandle.cs79-122