VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/2.1-native-library-integration

⇱ Native Library Integration | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Native Library Integration

This page documents how LLamaSharp interfaces with the native llama.cpp library through Platform Invoke (P/Invoke), detailing the NativeApi layer, the sophisticated runtime library loading mechanism, and the cross-platform binary distribution system.

For information about the SafeHandle wrappers that manage native resource lifetimes, see Memory Management and SafeHandles. For details on building native binaries from source, see Native Binary Compilation.


Overview

LLamaSharp acts as a high-level .NET wrapper around llama.cpp. The integration utilizes .NET's P/Invoke mechanism to bridge the managed/unmanaged boundary. While the native C++ library provides the core inference engine and tensor operations (via GGML), LLamaSharp provides a type-safe, idiomatic C# API.

The native integration architecture is built on three pillars:

  1. P/Invoke Layer: Static declarations in NativeApi mapping to C-exported functions in llama.cpp.
  2. Dynamic Loading: A custom DllImportResolver that selects the optimal binary based on runtime CPU feature detection (AVX levels) and GPU availability.
  3. Modular Backends: A package structure that allows users to swap CPU, CUDA, or Vulkan backends by simply referencing different NuGet packages.

Native Library Architecture

The following diagram maps the relationship between managed high-level entities and the underlying native GGML/llama.cpp components.

Code Entity Space Mapping

System ConceptCode Entity (Managed)Native Binary / Function
Model WeightsLLamaWeightsllama_model_load_from_file
Inference ContextLLamaContextllama_init_from_model
Native BridgeNativeApi LLama/Native/NativeApi.Load.cs7-8llama.dll / libllama.so LLama/Native/NativeApi.Load.cs108
Library ConfigNativeLibraryConfig LLama/Native/NativeLibraryConfig.cs14-15N/A (Managed Selection Logic)
Backend OpsN/Aggml-cpu.dll, ggml-cuda.dll, ggml-vulkan.dll LLama/Native/Load/NativeLibraryUtils.cs100-110

Architecture Flow


Sources: LLama/Native/NativeApi.Load.cs7-116 LLama/Native/Load/NativeLibraryUtils.cs9-130 LLama/Native/NativeLogConfig.cs23-24


The NativeApi and Loading Logic

The NativeApi class contains the [DllImport] declarations. However, unlike standard P/Invoke which relies on the default OS search path, LLamaSharp uses a custom resolver to handle the complex dependency tree of modern llama.cpp (which splits logic across multiple DLLs like ggml, ggml-base, and backend-specific binaries).

Custom DLL Resolution

The static constructor of NativeApi calls SetDllImportResolver LLama/Native/NativeApi.Load.cs51-90 On .NET 5.0+, this intercepts calls to llama and mtmd and uses NativeLibraryUtils.TryLoadLibrary to find the best fit LLama/Native/NativeApi.Load.cs71-83

Manual Dependency Loading

Because llama.cpp now splits backends into separate files (e.g., ggml-cuda.dll), LLamaSharp manually loads dependencies in a specific order to ensure the OS loader finds them LLama/Native/Load/NativeLibraryUtils.cs48-52

The loading sequence for a typical library:

  1. ggml-base: Core tensor logic LLama/Native/Load/NativeLibraryUtils.cs67
  2. Backend: ggml-cpu (with detected AVX level), ggml-cuda, or ggml-vulkan LLama/Native/Load/NativeLibraryUtils.cs98-111
  3. ggml: The main GGML interface LLama/Native/Load/NativeLibraryUtils.cs117
  4. llama: The high-level llama.cpp library LLama/Native/NativeApi.Load.cs64-73

Sources: LLama/Native/NativeApi.Load.cs9-44 LLama/Native/Load/NativeLibraryUtils.cs39-124


Runtime Library Selection

LLamaSharp detects system capabilities at startup to choose the most optimized binary.

Selection Flow


Sources: LLama/Native/Load/SystemInfo.cs22-43 LLama/Native/Load/NativeLibraryUtils.cs18-36 LLama/Native/Load/NativeLibraryUtils.cs120-125

CPU Feature Detection

For x86_64 platforms, LLamaSharp differentiates between AVX levels to maximize performance.

The NativeLibraryWithAvx class constructs paths like runtimes/{os}/native/{avxStr}{libPrefix}{_libraryName}{fileExtension} LLama/Native/Load/NativeLibraryWithAvx.cs57

GPU Detection


Binary Distribution and Build Process

Native binaries are pre-compiled and distributed. LLamaSharp uses a selection policy to find these binaries within the project structure.

Selection Policy

The DefaultNativeLibrarySelectingPolicy determines the order of preference for loading libraries LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs8-12

  1. Path Override: If NativeLibraryConfig.WithLibrary was used, that path is tried first LLama/Native/Load/NativeLibraryConfig.cs38-44
  2. GPU Backends: CUDA and Vulkan are prioritized if enabled and detected LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs23-31
  3. AVX Variants: If fallback is allowed, the policy iterates through AVX levels from AVX512 down to None LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs35-47
  4. OSX/Fallback: Specific Mac libraries (including Rosetta 2 detection) or generic fallbacks are tried last LLama/Native/Load/NativeLibraryWithMacOrFallback.cs37-41

Platform Path Parts

Path resolution accounts for OS-specific naming conventions via NativeLibraryUtils.GetPlatformPathParts LLama/Native/Load/NativeLibraryUtils.cs23-27:

  • Windows: .dll extension, no prefix.
  • Linux: .so extension, lib prefix.
  • OSX: .dylib extension, lib prefix.

Sources: LLama/Native/Load/NativeLibraryUtils.cs23-27 LLama/Native/Load/NativeLibraryConfig.cs185-195 LLama/Native/Load/DefaultNativeLibrarySelectingPolicy.cs11-59


Forced Loading and Initialization

To prevent "DLL not found" errors deep in the application logic, LLamaSharp forces loading during the static initialization of core classes.

  1. Empty Call: NativeApi.llama_empty_call() is invoked to trigger the DllImportResolver LLama/Native/NativeApi.Load.cs25 This can be called manually to force loading immediately LLama.Examples/Program.cs36
  2. Backend Init: Once the DLL is resolved and loaded, llama_backend_init() is called automatically in the NativeApi static constructor to prepare the native GGML environment LLama/Native/NativeApi.Load.cs43
  3. Logging: If a managed log callback is provided via NativeLibraryConfig, it is registered with the native library via NativeLogConfig.llama_log_set LLama/Native/NativeApi.Load.cs40

Sources: LLama/Native/NativeApi.Load.cs9-44 LLama.Examples/Program.cs21-36 LLama/Native/NativeLogConfig.cs36-49