VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.4-native-interop-api

⇱ Native Interop API | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Native Interop API

This page provides a low-level reference for the NativeApi class and the SafeHandle implementations that form LLamaSharp's native interop layer. This documentation is intended for advanced users who need to understand the P/Invoke boundary, extend the library with new native functions, or debug native interactions.

Overview

The NativeApi class LLama/Native/NativeApi.cs11-12 is a static partial class containing P/Invoke declarations that bind to the native llama.cpp library. It forms the lowest layer of managed code in LLamaSharp, directly above the native binary boundary.

Architecture:

SafeHandle wrappers:

SafeHandle classWraps native typePurpose
SafeLlamaModelHandlellama_model*Model weights and metadata LLama/Native/SafeLlamaModelHandle.cs15-16
SafeLLamaContextHandlellama_context*Inference context with KV cache LLama/Native/SafeLLamaContextHandle.cs13-14
SafeMtmdModelHandlemtmd_model*Multimodal projection model weights LLama/Native/SafeMtmdModelHandle.cs13
SafeMtmdEmbedmtmd_bitmap*Media embeddings (image/audio) LLama/Native/SafeMtmdEmbed.cs11-12
SafeMtmdInputChunkmtmd_input_chunk*A single chunk of multimodal data LLama/Native/SafeMtmdInputChunk.cs10-11
SafeMtmdInputChunksmtmd_input_chunks*Collection of multimodal input chunks LLama/Native/SafeMtmdInputChunks.cs9-10

Sources: LLama/Native/NativeApi.cs1-12 LLama/Native/SafeLLamaHandleBase.cs8-21 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/Native/SafeMtmdModelHandle.cs13-14 LLama/Native/SafeMtmdEmbed.cs11-12 LLama/Native/SafeMtmdInputChunk.cs10-11 LLama/Native/SafeMtmdInputChunks.cs9-10

Interop Architecture

Call chain from managed objects to native functions


The interop layer performs these operations at the P/Invoke boundary:

Sources: LLama/Native/NativeApi.cs33-156 LLama/LLamaWeights.cs24 LLama/LLamaContext.cs42 LLama/Native/SafeLlamaModelHandle.cs123-127 LLama/Native/SafeLLamaContextHandle.cs80-90 LLama/Native/NativeApi.Mtmd.cs32-150

Library Loading and Initialization

Backend Initialization


LLama/Native/NativeApi.cs87 defines the private llama_backend_init. LLamaSharp automatically calls it, as it is only valid to call it once LLama/Native/NativeApi.cs82-85

Empty call pattern:

LLama/Native/NativeApi.cs17-20 provides llama_empty_call(), which forces native library loading by calling a harmless function (llama_max_devices()). This is used in static constructors of safe handles to ensure dependencies are loaded LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Sources: LLama/Native/NativeApi.cs17-34 LLama/Native/NativeApi.cs82-87 LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Function Categories

Model Operations

Model lifecycle and query functions















































FunctionPurposeReturns
llama_model_load_from_fileLoad model from GGUF fileSafeLlamaModelHandle LLama/Native/SafeLlamaModelHandle.cs186
llama_model_freeRelease model memoryvoid LLama/Native/SafeLlamaModelHandle.cs125
llama_model_n_embdGet embedding dimensionint LLama/Native/SafeLlamaModelHandle.cs36
llama_model_n_ctx_trainGet training context sizeint LLama/Native/SafeLlamaModelHandle.cs26
llama_model_n_layerGet layer countint LLama/Native/SafeLlamaModelHandle.cs51
llama_model_descGet model descriptionint (string length) LLama/Native/SafeLlamaModelHandle.cs98
llama_model_meta_countGet metadata pair countint LLama/Native/SafeLlamaModelHandle.cs113

Sources: LLama/Native/SafeLlamaModelHandle.cs15-186

Context Operations

Context creation and inference functions


Key context functions:

FunctionSignaturePurpose
llama_init_from_model(model, params) → ctxCreate context LLama/Native/SafeLLamaContextHandle.cs139
llama_free(ctx) → voidFree context memory LLama/Native/SafeLLamaContextHandle.cs146
llama_decode(ctx, batch) → intProcess token batch LLama/Native/SafeLLamaContextHandle.cs180
llama_get_logits_ith(ctx, i) → float*Get logits for ith token LLama/Native/SafeLLamaContextHandle.cs501
llama_n_ctx(ctx) → uintGet context size LLama/Native/SafeLLamaContextHandle.cs20
llama_n_batch(ctx) → uintGet max batch size LLama/Native/SafeLLamaContextHandle.cs30

Sources: LLama/Native/SafeLLamaContextHandle.cs20-501

Multimodal Operations (MTMD)

Multimodal support is implemented via the mtmd helper library, providing specialized handles for images and audio.

Multimodal Entity Relationship



































FunctionPurpose
mtmd_init_from_fileLoad multimodal weights (MMP) LLama/Native/SafeMtmdModelHandle.cs53
mtmd_bitmap_initCreate embedding from RGB pixels LLama/Native/SafeMtmdEmbed.cs50
mtmd_bitmap_init_from_audioCreate embedding from PCM samples LLama/Native/SafeMtmdEmbed.cs68
mtmd_tokenizeTokenize text with media embeddings LLama/Native/SafeMtmdModelHandle.cs138
mtmd_input_chunks_getRetrieve a chunk from a collection LLama/Native/SafeMtmdInputChunks.cs89
mtmd_input_chunk_get_typeGet modality of a chunk (Text/Image/Audio) LLama/Native/SafeMtmdInputChunk.cs73

Sources: LLama/Native/NativeApi.Mtmd.cs15-150 LLama/Native/SafeMtmdModelHandle.cs13-151 LLama/MtmdWeights.cs12-146 LLama/Native/SafeMtmdInputChunk.cs71-73 LLama/Native/SafeMtmdEmbed.cs50-68 LLama/Native/SafeMtmdInputChunks.cs89

Tokenization

The tokenization APIs convert between text and token sequences. SafeLLamaContextHandle provides a high-level Tokenize method LLama/LLamaContext.cs107 which uses llama_tokenize internally.

Primary function signature:


Sources: LLama/Native/NativeApi.cs156-157 LLama/LLamaContext.cs107-110

State Management

State management functions save and restore the KV cache and context state:

FunctionPurpose
llama_state_get_sizeQuery required buffer size LLama/Native/SafeLLamaContextHandle.cs269
llama_state_get_dataCopy state to buffer LLama/Native/SafeLLamaContextHandle.cs281
llama_state_set_dataRestore state from buffer LLama/Native/SafeLLamaContextHandle.cs317

File-based state: LLamaContext provides convenience methods like SaveState LLama/LLamaContext.cs133 and LoadState LLama/LLamaContext.cs240 which use MemoryMappedFile to interact with these native state functions efficiently without extra C# array copies LLama/LLamaContext.cs142-160

Sources: LLama/Native/SafeLLamaContextHandle.cs269-328 LLama/LLamaContext.cs133-353

Memory Management

The NativeApi.Memory.cs partial class provides functions for managing the KV cache memory within a llama_memory_t structure. These functions allow for fine-grained control over sequence manipulation, which is crucial for advanced features like context shifting and batched inference.

KV Cache Memory Operations















































FunctionPurpose
llama_memory_clearClears the memory contents, optionally including data buffers. LLama/Native/NativeApi.Memory.cs13
llama_memory_seq_rmRemoves tokens belonging to a specific sequence within a position range. LLama/Native/NativeApi.Memory.cs25
llama_memory_seq_cpCopies tokens from one sequence to another. LLama/Native/NativeApi.Memory.cs37
llama_memory_seq_keepRemoves all tokens that do not belong to the specified sequence. LLama/Native/NativeApi.Memory.cs45
llama_memory_seq_addAdds a relative position delta to tokens in a sequence within a range. LLama/Native/NativeApi.Memory.cs55
llama_memory_seq_divPerforms integer division on token positions within a sequence. LLama/Native/NativeApi.Memory.cs70
llama_memory_seq_pos_minReturns the smallest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs83
llama_memory_seq_pos_maxReturns the largest position present in memory for a sequence. LLama/Native/NativeApi.Memory.cs93
llama_memory_can_shiftChecks if the memory supports shifting operations. LLama/Native/NativeApi.Memory.cs102

Sources: LLama/Native/NativeApi.Memory.cs1-104

Marshaling and Type Mapping

Safe Handles and Reference Counting

All native pointers are wrapped in SafeHandle subclasses that implement IDisposable for automatic resource cleanup.

Context-Model Ownership: SafeLLamaContextHandle holds a reference to its parent SafeLlamaModelHandle LLama/Native/SafeLLamaContextHandle.cs75 and calls DangerousAddRef LLama/Native/SafeLLamaContextHandle.cs119 to prevent the model from being freed while the context is alive. When the context is disposed, it calls DangerousRelease LLama/Native/SafeLLamaContextHandle.cs86 on the model.

Sources: LLama/Native/SafeLLamaContextHandle.cs75-122 LLama/Native/SafeLLamaHandleBase.cs8-27

Struct Layouts

Parameters are passed to native code using structs with [StructLayout(LayoutKind.Sequential)].

Sources: LLama/Native/NativeApi.Mtmd.cs16-30 LLama/Native/LLamaContextParams.cs21-213 LLama/Native/LLamaModelQuantizeParams.cs10-111

String Marshaling

All string parameters are marshaled as UTF-8 because llama.cpp uses UTF-8. Safe handles often use Encoding.UTF8.GetString to convert native byte pointers back to C# strings LLama/Native/SafeLlamaModelHandle.cs103 For inputs, PinnedUtf8String is used to provide a stable pointer for the duration of the native call LLama/Native/SafeMtmdModelHandle.cs45

Sources: LLama/Native/SafeLlamaModelHandle.cs91-107 LLama/Native/SafeMtmdModelHandle.cs45-47