Last indexed: 18 May 2026 (ecd184)

Native Interop API

This page provides a low-level reference for the NativeApi class and the SafeHandle implementations that form LLamaSharp's native interop layer. This documentation is intended for advanced users who need to understand the P/Invoke boundary, extend the library with new native functions, or debug native interactions.

Overview

The NativeApi class LLama/Native/NativeApi.cs11-12 is a static partial class containing P/Invoke declarations that bind to the native llama.cpp library. It forms the lowest layer of managed code in LLamaSharp, directly above the native binary boundary.

Architecture:

NativeApi declares functions using [DllImport] with CallingConvention.Cdecl LLama/Native/NativeApi.cs33-34
Functions are declared static extern to indicate P/Invoke imports LLama/Native/NativeApi.cs26
Native pointers are wrapped in SafeHandle subclasses for automatic resource management LLama/Native/SafeLLamaHandleBase.cs8-9 and type-safe reference counting.
The class is partial to allow function declarations to be split across multiple files, such as NativeApi.Mtmd.cs for multimodal support LLama/Native/NativeApi.Mtmd.cs9-10

SafeHandle wrappers:

SafeHandle class	Wraps native type	Purpose
`SafeLlamaModelHandle`	`llama_model*`	Model weights and metadata LLama/Native/SafeLlamaModelHandle.cs15-16
`SafeLLamaContextHandle`	`llama_context*`	Inference context with KV cache LLama/Native/SafeLLamaContextHandle.cs13-14
`SafeMtmdModelHandle`	`mtmd_model*`	Multimodal projection model weights LLama/Native/SafeMtmdModelHandle.cs13
`SafeMtmdEmbed`	`mtmd_bitmap*`	Media embeddings (image/audio) LLama/Native/SafeMtmdEmbed.cs11-12
`SafeMtmdInputChunk`	`mtmd_input_chunk*`	A single chunk of multimodal data LLama/Native/SafeMtmdInputChunk.cs10-11
`SafeMtmdInputChunks`	`mtmd_input_chunks*`	Collection of multimodal input chunks LLama/Native/SafeMtmdInputChunks.cs9-10

Sources: LLama/Native/NativeApi.cs1-12 LLama/Native/SafeLLamaHandleBase.cs8-21 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/Native/SafeMtmdModelHandle.cs13-14 LLama/Native/SafeMtmdEmbed.cs11-12 LLama/Native/SafeMtmdInputChunk.cs10-11 LLama/Native/SafeMtmdInputChunks.cs9-10

Interop Architecture

Call chain from managed objects to native functions

The interop layer performs these operations at the P/Invoke boundary:

Type marshaling: string ↔ UTF-8 byte*, arrays to pinned pointers LLama/Native/SafeLlamaModelHandle.cs91-107
SafeHandle wrapping: Raw pointers (e.g., llama_model*) wrapped in SafeLlamaModelHandle LLama/Native/SafeLlamaModelHandle.cs185-186
Error translation: Native return codes translated to exceptions or result enums LLama/Native/SafeLLamaContextHandle.cs112-113
Memory management: Reference counting for model handles, automatic cleanup via Dispose LLama/Native/SafeLLamaContextHandle.cs80-90

Sources: LLama/Native/NativeApi.cs33-156 LLama/LLamaWeights.cs24 LLama/LLamaContext.cs42 LLama/Native/SafeLlamaModelHandle.cs123-127 LLama/Native/SafeLLamaContextHandle.cs80-90 LLama/Native/NativeApi.Mtmd.cs32-150

Library Loading and Initialization

Backend Initialization

LLama/Native/NativeApi.cs87 defines the private llama_backend_init. LLamaSharp automatically calls it, as it is only valid to call it once LLama/Native/NativeApi.cs82-85

Empty call pattern:

LLama/Native/NativeApi.cs17-20 provides llama_empty_call(), which forces native library loading by calling a harmless function (llama_max_devices()). This is used in static constructors of safe handles to ensure dependencies are loaded LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Sources: LLama/Native/NativeApi.cs17-34 LLama/Native/NativeApi.cs82-87 LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Function Categories

Model Operations

Model lifecycle and query functions

Function	Purpose	Returns
`llama_model_load_from_file`	Load model from GGUF file	`SafeLlamaModelHandle` LLama/Native/SafeLlamaModelHandle.cs186
`llama_model_free`	Release model memory	void LLama/Native/SafeLlamaModelHandle.cs125
`llama_model_n_embd`	Get embedding dimension	int LLama/Native/SafeLlamaModelHandle.cs36
`llama_model_n_ctx_train`	Get training context size	int LLama/Native/SafeLlamaModelHandle.cs26
`llama_model_n_layer`	Get layer count	int LLama/Native/SafeLlamaModelHandle.cs51
`llama_model_desc`	Get model description	int (string length) LLama/Native/SafeLlamaModelHandle.cs98
`llama_model_meta_count`	Get metadata pair count	int LLama/Native/SafeLlamaModelHandle.cs113

Sources: LLama/Native/SafeLlamaModelHandle.cs15-186

Context Operations

Context creation and inference functions

Key context functions:

Function	Signature	Purpose
`llama_init_from_model`	`(model, params) → ctx`	Create context LLama/Native/SafeLLamaContextHandle.cs139
`llama_free`	`(ctx) → void`	Free context memory LLama/Native/SafeLLamaContextHandle.cs146
`llama_decode`	`(ctx, batch) → int`	Process token batch LLama/Native/SafeLLamaContextHandle.cs180
`llama_get_logits_ith`	`(ctx, i) → float*`	Get logits for ith token LLama/Native/SafeLLamaContextHandle.cs501
`llama_n_ctx`	`(ctx) → uint`	Get context size LLama/Native/SafeLLamaContextHandle.cs20
`llama_n_batch`	`(ctx) → uint`	Get max batch size LLama/Native/SafeLLamaContextHandle.cs30

Sources: LLama/Native/SafeLLamaContextHandle.cs20-501

Multimodal Operations (MTMD)

Multimodal support is implemented via the mtmd helper library, providing specialized handles for images and audio.

Multimodal Entity Relationship

Function	Purpose
`mtmd_init_from_file`	Load multimodal weights (MMP) LLama/Native/SafeMtmdModelHandle.cs53
`mtmd_bitmap_init`	Create embedding from RGB pixels LLama/Native/SafeMtmdEmbed.cs50
`mtmd_bitmap_init_from_audio`	Create embedding from PCM samples LLama/Native/SafeMtmdEmbed.cs68
`mtmd_tokenize`	Tokenize text with media embeddings LLama/Native/SafeMtmdModelHandle.cs138
`mtmd_input_chunks_get`	Retrieve a chunk from a collection LLama/Native/SafeMtmdInputChunks.cs89
`mtmd_input_chunk_get_type`	Get modality of a chunk (Text/Image/Audio) LLama/Native/SafeMtmdInputChunk.cs73

Sources: LLama/Native/NativeApi.Mtmd.cs15-150 LLama/Native/SafeMtmdModelHandle.cs13-151 LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdInputChunk.cs71-73 LLama/Native/SafeMtmdEmbed.cs50-68 LLama/Native/SafeMtmdInputChunks.cs89

Tokenization

The tokenization APIs convert between text and token sequences. SafeLLamaContextHandle provides a high-level Tokenize method LLama/LLamaContext.cs107 which uses llama_tokenize internally.

Primary function signature:

Sources: LLama/Native/NativeApi.cs156-157 LLama/LLamaContext.cs107-110

State Management

State management functions save and restore the KV cache and context state:

Function	Purpose
`llama_state_get_size`	Query required buffer size LLama/Native/SafeLLamaContextHandle.cs269
`llama_state_get_data`	Copy state to buffer LLama/Native/SafeLLamaContextHandle.cs281
`llama_state_set_data`	Restore state from buffer LLama/Native/SafeLLamaContextHandle.cs317

File-based state: LLamaContext provides convenience methods like SaveState LLama/LLamaContext.cs133 and LoadState LLama/LLamaContext.cs240 which use MemoryMappedFile to interact with these native state functions efficiently without extra C# array copies LLama/LLamaContext.cs142-160

Sources: LLama/Native/SafeLLamaContextHandle.cs269-328 LLama/LLamaContext.cs133-353

Memory Management

The NativeApi.Memory.cs partial class provides functions for managing the KV cache memory within a llama_memory_t structure. These functions allow for fine-grained control over sequence manipulation, which is crucial for advanced features like context shifting and batched inference.

KV Cache Memory Operations

Function	Purpose
`llama_memory_clear`	Clears the memory contents, optionally including data buffers. LLama/Native/NativeApi.Memory.cs13
`llama_memory_seq_rm`	Removes tokens belonging to a specific sequence within a position range. LLama/Native/NativeApi.Memory.cs25
`llama_memory_seq_cp`	Copies tokens from one sequence to another. LLama/Native/NativeApi.Memory.cs37
`llama_memory_seq_keep`	Removes all tokens that do not belong to the specified sequence. LLama/Native/NativeApi.Memory.cs45
`llama_memory_seq_add`	Adds a relative position `delta` to tokens in a sequence within a range. LLama/Native/NativeApi.Memory.cs55
`llama_memory_seq_div`	Performs integer division on token positions within a sequence. LLama/Native/NativeApi.Memory.cs70
`llama_memory_seq_pos_min`	Returns the smallest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs83
`llama_memory_seq_pos_max`	Returns the largest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs93
`llama_memory_can_shift`	Checks if the memory supports shifting operations. LLama/Native/NativeApi.Memory.cs102

Sources: LLama/Native/NativeApi.Memory.cs1-104

Marshaling and Type Mapping

Safe Handles and Reference Counting

All native pointers are wrapped in SafeHandle subclasses that implement IDisposable for automatic resource cleanup.

Context-Model Ownership: SafeLLamaContextHandle holds a reference to its parent SafeLlamaModelHandle LLama/Native/SafeLLamaContextHandle.cs75 and calls DangerousAddRef LLama/Native/SafeLLamaContextHandle.cs119 to prevent the model from being freed while the context is alive. When the context is disposed, it calls DangerousRelease LLama/Native/SafeLLamaContextHandle.cs86 on the model.

Sources: LLama/Native/SafeLLamaContextHandle.cs75-122 LLama/Native/SafeLLamaHandleBase.cs8-27

Struct Layouts

Parameters are passed to native code using structs with [StructLayout(LayoutKind.Sequential)].

mtmd_context_params: Configures multimodal specific parameters like GPU usage, thread counts, and media markers LLama/Native/NativeApi.Mtmd.cs16-30
LLamaContextParams: Configures context size, batching, and threading LLama/Native/LLamaContextParams.cs21-213
LLamaModelQuantizeParams: Configures model quantization settings LLama/Native/LLamaModelQuantizeParams.cs10-111

Sources: LLama/Native/NativeApi.Mtmd.cs16-30 LLama/Native/LLamaContextParams.cs21-213 LLama/Native/LLamaModelQuantizeParams.cs10-111

String Marshaling

All string parameters are marshaled as UTF-8 because llama.cpp uses UTF-8. Safe handles often use Encoding.UTF8.GetString to convert native byte pointers back to C# strings LLama/Native/SafeLlamaModelHandle.cs103 For inputs, PinnedUtf8String is used to provide a stable pointer for the duration of the native call LLama/Native/SafeMtmdModelHandle.cs45

Sources: LLama/Native/SafeLlamaModelHandle.cs91-107 LLama/Native/SafeMtmdModelHandle.cs45-47

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.4-native-interop-api