![]() |
VOOZH | about |
This page provides a low-level reference for the NativeApi class and the SafeHandle implementations that form LLamaSharp's native interop layer. This documentation is intended for advanced users who need to understand the P/Invoke boundary, extend the library with new native functions, or debug native interactions.
The NativeApi class LLama/Native/NativeApi.cs11-12 is a static partial class containing P/Invoke declarations that bind to the native llama.cpp library. It forms the lowest layer of managed code in LLamaSharp, directly above the native binary boundary.
Architecture:
NativeApi declares functions using [DllImport] with CallingConvention.Cdecl LLama/Native/NativeApi.cs33-34static extern to indicate P/Invoke imports LLama/Native/NativeApi.cs26SafeHandle subclasses for automatic resource management LLama/Native/SafeLLamaHandleBase.cs8-9 and type-safe reference counting.partial to allow function declarations to be split across multiple files, such as NativeApi.Mtmd.cs for multimodal support LLama/Native/NativeApi.Mtmd.cs9-10SafeHandle wrappers:
| SafeHandle class | Wraps native type | Purpose |
|---|---|---|
SafeLlamaModelHandle | llama_model* | Model weights and metadata LLama/Native/SafeLlamaModelHandle.cs15-16 |
SafeLLamaContextHandle | llama_context* | Inference context with KV cache LLama/Native/SafeLLamaContextHandle.cs13-14 |
SafeMtmdModelHandle | mtmd_model* | Multimodal projection model weights LLama/Native/SafeMtmdModelHandle.cs13 |
SafeMtmdEmbed | mtmd_bitmap* | Media embeddings (image/audio) LLama/Native/SafeMtmdEmbed.cs11-12 |
SafeMtmdInputChunk | mtmd_input_chunk* | A single chunk of multimodal data LLama/Native/SafeMtmdInputChunk.cs10-11 |
SafeMtmdInputChunks | mtmd_input_chunks* | Collection of multimodal input chunks LLama/Native/SafeMtmdInputChunks.cs9-10 |
Sources: LLama/Native/NativeApi.cs1-12 LLama/Native/SafeLLamaHandleBase.cs8-21 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/Native/SafeMtmdModelHandle.cs13-14 LLama/Native/SafeMtmdEmbed.cs11-12 LLama/Native/SafeMtmdInputChunk.cs10-11 LLama/Native/SafeMtmdInputChunks.cs9-10
Call chain from managed objects to native functions
The interop layer performs these operations at the P/Invoke boundary:
string ↔ UTF-8 byte*, arrays to pinned pointers LLama/Native/SafeLlamaModelHandle.cs91-107llama_model*) wrapped in SafeLlamaModelHandle LLama/Native/SafeLlamaModelHandle.cs185-186Dispose LLama/Native/SafeLLamaContextHandle.cs80-90Sources: LLama/Native/NativeApi.cs33-156 LLama/LLamaWeights.cs24 LLama/LLamaContext.cs42 LLama/Native/SafeLlamaModelHandle.cs123-127 LLama/Native/SafeLLamaContextHandle.cs80-90 LLama/Native/NativeApi.Mtmd.cs32-150
LLama/Native/NativeApi.cs87 defines the private llama_backend_init. LLamaSharp automatically calls it, as it is only valid to call it once LLama/Native/NativeApi.cs82-85
Empty call pattern:
LLama/Native/NativeApi.cs17-20 provides llama_empty_call(), which forces native library loading by calling a harmless function (llama_max_devices()). This is used in static constructors of safe handles to ensure dependencies are loaded LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130
Sources: LLama/Native/NativeApi.cs17-34 LLama/Native/NativeApi.cs82-87 LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130
Model lifecycle and query functions
| Function | Purpose | Returns |
|---|---|---|
llama_model_load_from_file | Load model from GGUF file | SafeLlamaModelHandle LLama/Native/SafeLlamaModelHandle.cs186 |
llama_model_free | Release model memory | void LLama/Native/SafeLlamaModelHandle.cs125 |
llama_model_n_embd | Get embedding dimension | int LLama/Native/SafeLlamaModelHandle.cs36 |
llama_model_n_ctx_train | Get training context size | int LLama/Native/SafeLlamaModelHandle.cs26 |
llama_model_n_layer | Get layer count | int LLama/Native/SafeLlamaModelHandle.cs51 |
llama_model_desc | Get model description | int (string length) LLama/Native/SafeLlamaModelHandle.cs98 |
llama_model_meta_count | Get metadata pair count | int LLama/Native/SafeLlamaModelHandle.cs113 |
Sources: LLama/Native/SafeLlamaModelHandle.cs15-186
Context creation and inference functions
Key context functions:
| Function | Signature | Purpose |
|---|---|---|
llama_init_from_model | (model, params) → ctx | Create context LLama/Native/SafeLLamaContextHandle.cs139 |
llama_free | (ctx) → void | Free context memory LLama/Native/SafeLLamaContextHandle.cs146 |
llama_decode | (ctx, batch) → int | Process token batch LLama/Native/SafeLLamaContextHandle.cs180 |
llama_get_logits_ith | (ctx, i) → float* | Get logits for ith token LLama/Native/SafeLLamaContextHandle.cs501 |
llama_n_ctx | (ctx) → uint | Get context size LLama/Native/SafeLLamaContextHandle.cs20 |
llama_n_batch | (ctx) → uint | Get max batch size LLama/Native/SafeLLamaContextHandle.cs30 |
Sources: LLama/Native/SafeLLamaContextHandle.cs20-501
Multimodal support is implemented via the mtmd helper library, providing specialized handles for images and audio.
Multimodal Entity Relationship
| Function | Purpose |
|---|---|
mtmd_init_from_file | Load multimodal weights (MMP) LLama/Native/SafeMtmdModelHandle.cs53 |
mtmd_bitmap_init | Create embedding from RGB pixels LLama/Native/SafeMtmdEmbed.cs50 |
mtmd_bitmap_init_from_audio | Create embedding from PCM samples LLama/Native/SafeMtmdEmbed.cs68 |
mtmd_tokenize | Tokenize text with media embeddings LLama/Native/SafeMtmdModelHandle.cs138 |
mtmd_input_chunks_get | Retrieve a chunk from a collection LLama/Native/SafeMtmdInputChunks.cs89 |
mtmd_input_chunk_get_type | Get modality of a chunk (Text/Image/Audio) LLama/Native/SafeMtmdInputChunk.cs73 |
Sources: LLama/Native/NativeApi.Mtmd.cs15-150 LLama/Native/SafeMtmdModelHandle.cs13-151 LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdInputChunk.cs71-73 LLama/Native/SafeMtmdEmbed.cs50-68 LLama/Native/SafeMtmdInputChunks.cs89
The tokenization APIs convert between text and token sequences. SafeLLamaContextHandle provides a high-level Tokenize method LLama/LLamaContext.cs107 which uses llama_tokenize internally.
Primary function signature:
Sources: LLama/Native/NativeApi.cs156-157 LLama/LLamaContext.cs107-110
State management functions save and restore the KV cache and context state:
| Function | Purpose |
|---|---|
llama_state_get_size | Query required buffer size LLama/Native/SafeLLamaContextHandle.cs269 |
llama_state_get_data | Copy state to buffer LLama/Native/SafeLLamaContextHandle.cs281 |
llama_state_set_data | Restore state from buffer LLama/Native/SafeLLamaContextHandle.cs317 |
File-based state:
LLamaContext provides convenience methods like SaveState LLama/LLamaContext.cs133 and LoadState LLama/LLamaContext.cs240 which use MemoryMappedFile to interact with these native state functions efficiently without extra C# array copies LLama/LLamaContext.cs142-160
Sources: LLama/Native/SafeLLamaContextHandle.cs269-328 LLama/LLamaContext.cs133-353
The NativeApi.Memory.cs partial class provides functions for managing the KV cache memory within a llama_memory_t structure. These functions allow for fine-grained control over sequence manipulation, which is crucial for advanced features like context shifting and batched inference.
KV Cache Memory Operations
| Function | Purpose |
|---|---|
llama_memory_clear | Clears the memory contents, optionally including data buffers. LLama/Native/NativeApi.Memory.cs13 |
llama_memory_seq_rm | Removes tokens belonging to a specific sequence within a position range. LLama/Native/NativeApi.Memory.cs25 |
llama_memory_seq_cp | Copies tokens from one sequence to another. LLama/Native/NativeApi.Memory.cs37 |
llama_memory_seq_keep | Removes all tokens that do not belong to the specified sequence. LLama/Native/NativeApi.Memory.cs45 |
llama_memory_seq_add | Adds a relative position delta to tokens in a sequence within a range. LLama/Native/NativeApi.Memory.cs55 |
llama_memory_seq_div | Performs integer division on token positions within a sequence. LLama/Native/NativeApi.Memory.cs70 |
llama_memory_seq_pos_min | Returns the smallest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs83 |
llama_memory_seq_pos_max | Returns the largest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs93 |
llama_memory_can_shift | Checks if the memory supports shifting operations. LLama/Native/NativeApi.Memory.cs102 |
Sources: LLama/Native/NativeApi.Memory.cs1-104
All native pointers are wrapped in SafeHandle subclasses that implement IDisposable for automatic resource cleanup.
Context-Model Ownership:
SafeLLamaContextHandle holds a reference to its parent SafeLlamaModelHandle LLama/Native/SafeLLamaContextHandle.cs75 and calls DangerousAddRef LLama/Native/SafeLLamaContextHandle.cs119 to prevent the model from being freed while the context is alive. When the context is disposed, it calls DangerousRelease LLama/Native/SafeLLamaContextHandle.cs86 on the model.
Sources: LLama/Native/SafeLLamaContextHandle.cs75-122 LLama/Native/SafeLLamaHandleBase.cs8-27
Parameters are passed to native code using structs with [StructLayout(LayoutKind.Sequential)].
Sources: LLama/Native/NativeApi.Mtmd.cs16-30 LLama/Native/LLamaContextParams.cs21-213 LLama/Native/LLamaModelQuantizeParams.cs10-111
All string parameters are marshaled as UTF-8 because llama.cpp uses UTF-8. Safe handles often use Encoding.UTF8.GetString to convert native byte pointers back to C# strings LLama/Native/SafeLlamaModelHandle.cs103 For inputs, PinnedUtf8String is used to provide a stable pointer for the duration of the native call LLama/Native/SafeMtmdModelHandle.cs45
Sources: LLama/Native/SafeLlamaModelHandle.cs91-107 LLama/Native/SafeMtmdModelHandle.cs45-47
Refresh this wiki