![]() |
VOOZH | about |
This page documents how LLamaSharp interfaces with the native llama.cpp library through Platform Invoke (P/Invoke), detailing the NativeApi layer, the sophisticated runtime library loading mechanism, and the cross-platform binary distribution system.
For information about the SafeHandle wrappers that manage native resource lifetimes, see Memory Management and SafeHandles. For details on building native binaries from source, see Native Binary Compilation.
LLamaSharp acts as a high-level .NET wrapper around llama.cpp. The integration utilizes .NET's P/Invoke mechanism to bridge the managed/unmanaged boundary. While the native C++ library provides the core inference engine and tensor operations (via GGML), LLamaSharp provides a type-safe, idiomatic C# API.
The native integration architecture is built on three pillars:
NativeApi mapping to C-exported functions in llama.cpp.DllImportResolver that selects the optimal binary based on runtime CPU feature detection (AVX levels) and GPU availability.The following diagram maps the relationship between managed high-level entities and the underlying native GGML/llama.cpp components.
| System Concept | Code Entity (Managed) | Native Binary / Function |
|---|---|---|
| Model Weights | LLamaWeights | llama_model_load_from_file |
| Inference Context | LLamaContext | llama_init_from_model |
| Native Bridge | NativeApi LLama/Native/NativeApi.Load.cs7-8 | llama.dll / libllama.so LLama/Native/NativeApi.Load.cs108 |
| Library Config | NativeLibraryConfig LLama/Native/NativeLibraryConfig.cs14-15 | N/A (Managed Selection Logic) |
| Backend Ops | N/A | ggml-cpu.dll, ggml-cuda.dll, ggml-vulkan.dll LLama/Native/Load/NativeLibraryUtils.cs100-110 |
Sources: LLama/Native/NativeApi.Load.cs7-116 LLama/Native/Load/NativeLibraryUtils.cs9-130 LLama/Native/NativeLogConfig.cs23-24
The NativeApi class contains the [DllImport] declarations. However, unlike standard P/Invoke which relies on the default OS search path, LLamaSharp uses a custom resolver to handle the complex dependency tree of modern llama.cpp (which splits logic across multiple DLLs like ggml, ggml-base, and backend-specific binaries).
The static constructor of NativeApi calls SetDllImportResolver LLama/Native/NativeApi.Load.cs51-90 On .NET 5.0+, this intercepts calls to llama and mtmd and uses NativeLibraryUtils.TryLoadLibrary to find the best fit LLama/Native/NativeApi.Load.cs71-83
Because llama.cpp now splits backends into separate files (e.g., ggml-cuda.dll), LLamaSharp manually loads dependencies in a specific order to ensure the OS loader finds them LLama/Native/Load/NativeLibraryUtils.cs48-52
The loading sequence for a typical library:
ggml-cpu (with detected AVX level), ggml-cuda, or ggml-vulkan LLama/Native/Load/NativeLibraryUtils.cs98-111Sources: LLama/Native/NativeApi.Load.cs9-44 LLama/Native/Load/NativeLibraryUtils.cs39-124
LLamaSharp detects system capabilities at startup to choose the most optimized binary.
Sources: LLama/Native/Load/SystemInfo.cs22-43 LLama/Native/Load/NativeLibraryUtils.cs18-36 LLama/Native/Load/NativeLibraryUtils.cs120-125
For x86_64 platforms, LLamaSharp differentiates between AVX levels to maximize performance.
The NativeLibraryWithAvx class constructs paths like runtimes/{os}/native/{avxStr}{libPrefix}{_libraryName}{fileExtension} LLama/Native/Load/NativeLibraryWithAvx.cs57
CUDA_PATH environment variable on Windows or LD_LIBRARY_PATH/CUDA_VERSION on Linux to determine the installed toolkit version LLama/Native/Load/SystemInfo.cs123-181vulkaninfo --summary to parse the apiVersion and confirm Vulkan support LLama/Native/Load/SystemInfo.cs64-119Native binaries are pre-compiled and distributed. LLamaSharp uses a selection policy to find these binaries within the project structure.
The DefaultNativeLibrarySelectingPolicy determines the order of preference for loading libraries LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs8-12
NativeLibraryConfig.WithLibrary was used, that path is tried first LLama/Native/Load/NativeLibraryConfig.cs38-44Path resolution accounts for OS-specific naming conventions via NativeLibraryUtils.GetPlatformPathParts LLama/Native/Load/NativeLibraryUtils.cs23-27:
.dll extension, no prefix..so extension, lib prefix..dylib extension, lib prefix.Sources: LLama/Native/Load/NativeLibraryUtils.cs23-27 LLama/Native/Load/NativeLibraryConfig.cs185-195 LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs11-59
To prevent "DLL not found" errors deep in the application logic, LLamaSharp forces loading during the static initialization of core classes.
NativeApi.llama_empty_call() is invoked to trigger the DllImportResolver LLama/Native/NativeApi.Load.cs25 This can be called manually to force loading immediately LLama.Examples/Program.cs36llama_backend_init() is called automatically in the NativeApi static constructor to prepare the native GGML environment LLama/Native/NativeApi.Load.cs43NativeLibraryConfig, it is registered with the native library via NativeLogConfig.llama_log_set LLama/Native/NativeApi.Load.cs40Sources: LLama/Native/NativeApi.Load.cs9-44 LLama.Examples/Program.cs21-36 LLama/Native/NativeLogConfig.cs36-49
Refresh this wiki