Last indexed: 18 May 2026 (ecd184)

Native Library Integration

This page documents how LLamaSharp interfaces with the native llama.cpp library through Platform Invoke (P/Invoke), detailing the NativeApi layer, the sophisticated runtime library loading mechanism, and the cross-platform binary distribution system.

For information about the SafeHandle wrappers that manage native resource lifetimes, see Memory Management and SafeHandles. For details on building native binaries from source, see Native Binary Compilation.

Overview

LLamaSharp acts as a high-level .NET wrapper around llama.cpp. The integration utilizes .NET's P/Invoke mechanism to bridge the managed/unmanaged boundary. While the native C++ library provides the core inference engine and tensor operations (via GGML), LLamaSharp provides a type-safe, idiomatic C# API.

The native integration architecture is built on three pillars:

P/Invoke Layer: Static declarations in NativeApi mapping to C-exported functions in llama.cpp.
Dynamic Loading: A custom DllImportResolver that selects the optimal binary based on runtime CPU feature detection (AVX levels) and GPU availability.
Modular Backends: A package structure that allows users to swap CPU, CUDA, or Vulkan backends by simply referencing different NuGet packages.

Native Library Architecture

The following diagram maps the relationship between managed high-level entities and the underlying native GGML/llama.cpp components.

Code Entity Space Mapping

System Concept	Code Entity (Managed)	Native Binary / Function
Model Weights	`LLamaWeights`	`llama_model_load_from_file`
Inference Context	`LLamaContext`	`llama_init_from_model`
Native Bridge	`NativeApi` LLama/Native/NativeApi.Load.cs7-8	`llama.dll` / `libllama.so` LLama/Native/NativeApi.Load.cs108
Library Config	`NativeLibraryConfig` LLama/Native/NativeLibraryConfig.cs14-15	N/A (Managed Selection Logic)
Backend Ops	N/A	`ggml-cpu.dll`, `ggml-cuda.dll`, `ggml-vulkan.dll` LLama/Native/Load/NativeLibraryUtils.cs100-110

Architecture Flow

Sources: LLama/Native/NativeApi.Load.cs7-116 LLama/Native/Load/NativeLibraryUtils.cs9-130 LLama/Native/NativeLogConfig.cs23-24

The NativeApi and Loading Logic

The NativeApi class contains the [DllImport] declarations. However, unlike standard P/Invoke which relies on the default OS search path, LLamaSharp uses a custom resolver to handle the complex dependency tree of modern llama.cpp (which splits logic across multiple DLLs like ggml, ggml-base, and backend-specific binaries).

Custom DLL Resolution

The static constructor of NativeApi calls SetDllImportResolver LLama/Native/NativeApi.Load.cs51-90 On .NET 5.0+, this intercepts calls to llama and mtmd and uses NativeLibraryUtils.TryLoadLibrary to find the best fit LLama/Native/NativeApi.Load.cs71-83

Manual Dependency Loading

Because llama.cpp now splits backends into separate files (e.g., ggml-cuda.dll), LLamaSharp manually loads dependencies in a specific order to ensure the OS loader finds them LLama/Native/Load/NativeLibraryUtils.cs48-52

The loading sequence for a typical library:

ggml-base: Core tensor logic LLama/Native/Load/NativeLibraryUtils.cs67
Backend: ggml-cpu (with detected AVX level), ggml-cuda, or ggml-vulkan LLama/Native/Load/NativeLibraryUtils.cs98-111
ggml: The main GGML interface LLama/Native/Load/NativeLibraryUtils.cs117
llama: The high-level llama.cpp library LLama/Native/NativeApi.Load.cs64-73

Sources: LLama/Native/NativeApi.Load.cs9-44 LLama/Native/Load/NativeLibraryUtils.cs39-124

Runtime Library Selection

LLamaSharp detects system capabilities at startup to choose the most optimized binary.

Selection Flow

Sources: LLama/Native/Load/SystemInfo.cs22-43 LLama/Native/Load/NativeLibraryUtils.cs18-36 LLama/Native/Load/NativeLibraryUtils.cs120-125

CPU Feature Detection

For x86_64 platforms, LLamaSharp differentiates between AVX levels to maximize performance.

AVX512: Highest performance on modern server CPUs LLama/Native/Load/NativeLibraryConfig.cs192
AVX2: Standard for most modern consumer CPUs LLama/Native/Load/NativeLibraryConfig.cs191
AVX: Legacy support LLama/Native/Load/NativeLibraryConfig.cs190
None: Fallback for very old hardware or ARM LLama/Native/Load/NativeLibraryConfig.cs189

The NativeLibraryWithAvx class constructs paths like runtimes/{os}/native/{avxStr}{libPrefix}{_libraryName}{fileExtension} LLama/Native/Load/NativeLibraryWithAvx.cs57

GPU Detection

CUDA: LLamaSharp checks the CUDA_PATH environment variable on Windows or LD_LIBRARY_PATH/CUDA_VERSION on Linux to determine the installed toolkit version LLama/Native/Load/SystemInfo.cs123-181
Vulkan: The system runs vulkaninfo --summary to parse the apiVersion and confirm Vulkan support LLama/Native/Load/SystemInfo.cs64-119

Binary Distribution and Build Process

Native binaries are pre-compiled and distributed. LLamaSharp uses a selection policy to find these binaries within the project structure.

Selection Policy

The DefaultNativeLibrarySelectingPolicy determines the order of preference for loading libraries LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs8-12

Path Override: If NativeLibraryConfig.WithLibrary was used, that path is tried first LLama/Native/Load/NativeLibraryConfig.cs38-44
GPU Backends: CUDA and Vulkan are prioritized if enabled and detected LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs23-31
AVX Variants: If fallback is allowed, the policy iterates through AVX levels from AVX512 down to None LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs35-47
OSX/Fallback: Specific Mac libraries (including Rosetta 2 detection) or generic fallbacks are tried last LLama/Native/Load/NativeLibraryWithMacOrFallback.cs37-41

Platform Path Parts

Path resolution accounts for OS-specific naming conventions via NativeLibraryUtils.GetPlatformPathParts LLama/Native/Load/NativeLibraryUtils.cs23-27:

Windows: .dll extension, no prefix.
Linux: .so extension, lib prefix.
OSX: .dylib extension, lib prefix.

Sources: LLama/Native/Load/NativeLibraryUtils.cs23-27 LLama/Native/Load/NativeLibraryConfig.cs185-195 LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs11-59

Forced Loading and Initialization

To prevent "DLL not found" errors deep in the application logic, LLamaSharp forces loading during the static initialization of core classes.

Empty Call: NativeApi.llama_empty_call() is invoked to trigger the DllImportResolver LLama/Native/NativeApi.Load.cs25 This can be called manually to force loading immediately LLama.Examples/Program.cs36
Backend Init: Once the DLL is resolved and loaded, llama_backend_init() is called automatically in the NativeApi static constructor to prepare the native GGML environment LLama/Native/NativeApi.Load.cs43
Logging: If a managed log callback is provided via NativeLibraryConfig, it is registered with the native library via NativeLogConfig.llama_log_set LLama/Native/NativeApi.Load.cs40

Sources: LLama/Native/NativeApi.Load.cs9-44 LLama.Examples/Program.cs21-36 LLama/Native/NativeLogConfig.cs36-49

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/2.1-native-library-integration