VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/8.1-native-binary-compilation

⇱ Native Binary Compilation | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Native Binary Compilation

Purpose and Scope

This page documents the native binary compilation system that builds llama.cpp libraries for all supported platforms and backends. The compilation process is orchestrated through GitHub Actions and produces the native libraries distributed in LLamaSharp's NuGet packages. For information about the CI/CD release pipeline, see Release Process For details on testing infrastructure, see Testing Framework

The compilation system handles:

  • Building llama.cpp from a specific commit across multiple platforms.
  • Generating optimized binaries for different CPU instruction sets (AVX, AVX2, AVX512).
  • Compiling GPU-accelerated backends (CUDA, Vulkan, Metal).
  • Supporting diverse operating systems (Windows, Linux, macOS, Android, musl-based systems).
  • Consolidating binaries into a structured deps/ directory for NuGet packaging.

Sources: .github/workflows/compile.yml1-23


Workflow Overview

The native binary compilation is managed by the compile.yml GitHub Actions workflow, which can be triggered manually via workflow_dispatch or through automated pushes to the cron_job branch. The workflow accepts a llama_cpp_commit parameter that specifies which branch, tag, or commit hash of llama.cpp to build against.

Compilation Workflow Logic


The workflow uses concurrency control to prevent multiple builds with the same parameters from running simultaneously. Each compilation job runs independently in parallel, and the build-deps job waits for all compilation jobs to complete using the needs keyword.

Sources: .github/workflows/compile.yml3-16 .github/workflows/compile.yml24-600


Platform and Backend Matrix

The compilation system uses GitHub Actions matrix builds to generate binaries for 40+ platform/backend combinations.

CPU Backend Configurations

PlatformMatrix ConfigurationsArchitecturesNotes
Linuxnoavx, avx, avx2, avx512, aarch64x64, arm64Ubuntu 22.04/24.04
Linux muslnoavx, avx, avx2, avx512x64Alpine Linux container
Windowsnoavx, avx, avx2, avx512x64MSVC compiled
Windows ARM64arm64arm64ClangCL toolchain
macOSarm64, x64, x64-rosetta2arm64, x64Metal support on arm64
Androidarm64-v8a, x86_64arm64, x86_64NDK r26d

The AVX configurations control CPU instruction set support:

Sources: .github/workflows/compile.yml24-154 .github/workflows/compile.yml156-216

GPU Backend Configurations

BackendPlatformsSDK VersionKey Flags
CUDAWindows, LinuxCUDA 12.4.0-DGGML_CUDA=ON
VulkanWindows, LinuxLatest-DGGML_VULKAN=ON
MetalmacOS arm64Built-in-DGGML_METAL=ON

Sources: .github/workflows/compile.yml18-21 LLama/runtimes/build/LLamaSharp.Backend.Cuda12.Windows.nuspec25-30 LLama/runtimes/build/LLamaSharp.Backend.Vulkan.Windows.nuspec25-30


CMake Build Configuration

All compilation jobs use CMake with common defines and platform-specific settings.

Common Build Flags (COMMON_DEFINE)

The following flags are applied to all builds:

  • GGML_NATIVE=OFF: Disables native CPU optimization detection to ensure reproducible builds.
  • LLAMA_BUILD_TESTS=OFF: Excludes test executables.
  • LLAMA_OPENSSL=OFF: Disables OpenSSL dependency.
  • BUILD_SHARED_LIBS=ON: Produces shared libraries (.dll, .so, .dylib).

Sources: .github/workflows/compile.yml18-21

Platform-Specific RPATH

Linux builds include RPATH settings to ensure libraries can find their dependencies (like ggml-base) at runtime:

Sources: .github/workflows/compile.yml21


Build Process and Produced Entities

Each job follows a standard sequence to produce the native shared libraries required by the managed wrapper.


Library Roles

LibraryPurpose
llamaMain llama.cpp API used by NativeApi.
ggmlCore GGML tensor operations.
ggml-baseBase GGML functionality.
ggml-cpuCPU-specific implementations (AVX/AVX2/AVX512).
mtmdMultimodal support (MTMD API).
ggml-cudaCUDA acceleration (CUDA builds only).
ggml-vulkanVulkan acceleration (Vulkan builds only).

Sources: .github/workflows/compile.yml65-89 .github/workflows/compile.yml187-216


Integration with NuGet Packaging

The compiled binaries are integrated into the .NET ecosystem via .nuspec files and MSBuild targets.

Runtime Selection Logic

At runtime, the managed layer determines which binary to load. Dependencies like ggml-base and ggml-cpu are organized into subdirectories (e.g., avx2, cuda12) to support different hardware features without filename collisions.

MSBuild Targets (LLamaSharp.Runtime.targets)

The LLamaSharp.Runtime.targets file maps the compiled binaries to the standard .NET runtimes/ folder structure. It uses conditional <None> items to include binaries for Windows, Linux, and other platforms, ensuring they are copied to the output directory.


Sources: LLama/LLamaSharp.Runtime.targets1-142

Backend Package Structure

Backend-specific packages use .nuspec files and MSBuild .props files to manage library inclusion and platform-specific logic.

Sources: LLama/runtimes/build/LLamaSharp.Backend.Cpu.nuspec1-131 LLama/runtimes/build/LLamaSharp.Backend.Cuda12.Windows.nuspec1-35 LLama/runtimes/build/LLamaSharp.Backend.Cpu.Android.nuspec1-36 LLama/runtimes/build/LLamaSharpBackend.props1-87