Last indexed: 18 May 2026 (ecd184)

Native Binary Compilation

Purpose and Scope

This page documents the native binary compilation system that builds llama.cpp libraries for all supported platforms and backends. The compilation process is orchestrated through GitHub Actions and produces the native libraries distributed in LLamaSharp's NuGet packages. For information about the CI/CD release pipeline, see Release Process For details on testing infrastructure, see Testing Framework

The compilation system handles:

Building llama.cpp from a specific commit across multiple platforms.
Generating optimized binaries for different CPU instruction sets (AVX, AVX2, AVX512).
Compiling GPU-accelerated backends (CUDA, Vulkan, Metal).
Supporting diverse operating systems (Windows, Linux, macOS, Android, musl-based systems).
Consolidating binaries into a structured deps/ directory for NuGet packaging.

Sources: .github/workflows/compile.yml1-23

Workflow Overview

The native binary compilation is managed by the compile.yml GitHub Actions workflow, which can be triggered manually via workflow_dispatch or through automated pushes to the cron_job branch. The workflow accepts a llama_cpp_commit parameter that specifies which branch, tag, or commit hash of llama.cpp to build against.

Compilation Workflow Logic

The workflow uses concurrency control to prevent multiple builds with the same parameters from running simultaneously. Each compilation job runs independently in parallel, and the build-deps job waits for all compilation jobs to complete using the needs keyword.

Sources: .github/workflows/compile.yml3-16 .github/workflows/compile.yml24-600

Platform and Backend Matrix

The compilation system uses GitHub Actions matrix builds to generate binaries for 40+ platform/backend combinations.

CPU Backend Configurations

Platform	Matrix Configurations	Architectures	Notes
Linux	`noavx`, `avx`, `avx2`, `avx512`, `aarch64`	x64, arm64	Ubuntu 22.04/24.04
Linux musl	`noavx`, `avx`, `avx2`, `avx512`	x64	Alpine Linux container
Windows	`noavx`, `avx`, `avx2`, `avx512`	x64	MSVC compiled
Windows ARM64	`arm64`	arm64	ClangCL toolchain
macOS	`arm64`, `x64`, `x64-rosetta2`	arm64, x64	Metal support on arm64
Android	`arm64-v8a`, `x86_64`	arm64, x86_64	NDK r26d

The AVX configurations control CPU instruction set support:

noavx: -DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF .github/workflows/compile.yml30-31
avx: -DGGML_AVX2=OFF .github/workflows/compile.yml38-39
avx2: Standard build flags .github/workflows/compile.yml34-35
avx512: -DGGML_AVX512=ON .github/workflows/compile.yml42-43

Sources: .github/workflows/compile.yml24-154 .github/workflows/compile.yml156-216

GPU Backend Configurations

Backend	Platforms	SDK Version	Key Flags
CUDA	Windows, Linux	CUDA 12.4.0	`-DGGML_CUDA=ON`
Vulkan	Windows, Linux	Latest	`-DGGML_VULKAN=ON`
Metal	macOS arm64	Built-in	`-DGGML_METAL=ON`

Sources: .github/workflows/compile.yml18-21 LLama/runtimes/build/LLamaSharp.Backend.Cuda12.Windows.nuspec25-30 LLama/runtimes/build/LLamaSharp.Backend.Vulkan.Windows.nuspec25-30

CMake Build Configuration

All compilation jobs use CMake with common defines and platform-specific settings.

Common Build Flags (`COMMON_DEFINE`)

The following flags are applied to all builds:

GGML_NATIVE=OFF: Disables native CPU optimization detection to ensure reproducible builds.
LLAMA_BUILD_TESTS=OFF: Excludes test executables.
LLAMA_OPENSSL=OFF: Disables OpenSSL dependency.
BUILD_SHARED_LIBS=ON: Produces shared libraries (.dll, .so, .dylib).

Sources: .github/workflows/compile.yml18-21

Platform-Specific RPATH

Linux builds include RPATH settings to ensure libraries can find their dependencies (like ggml-base) at runtime:

Linux: -DCMAKE_INSTALL_RPATH='$ORIGIN' -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_PLATFORM_NO_VERSIONED_SONAME=ON .github/workflows/compile.yml21

Sources: .github/workflows/compile.yml21

Build Process and Produced Entities

Each job follows a standard sequence to produce the native shared libraries required by the managed wrapper.

Library Roles

Library	Purpose
`llama`	Main llama.cpp API used by `NativeApi`.
`ggml`	Core GGML tensor operations.
`ggml-base`	Base GGML functionality.
`ggml-cpu`	CPU-specific implementations (AVX/AVX2/AVX512).
`mtmd`	Multimodal support (MTMD API).
`ggml-cuda`	CUDA acceleration (CUDA builds only).
`ggml-vulkan`	Vulkan acceleration (Vulkan builds only).

Sources: .github/workflows/compile.yml65-89 .github/workflows/compile.yml187-216

Integration with NuGet Packaging

The compiled binaries are integrated into the .NET ecosystem via .nuspec files and MSBuild targets.

Runtime Selection Logic

At runtime, the managed layer determines which binary to load. Dependencies like ggml-base and ggml-cpu are organized into subdirectories (e.g., avx2, cuda12) to support different hardware features without filename collisions.

MSBuild Targets (`LLamaSharp.Runtime.targets`)

The LLamaSharp.Runtime.targets file maps the compiled binaries to the standard .NET runtimes/ folder structure. It uses conditional <None> items to include binaries for Windows, Linux, and other platforms, ensuring they are copied to the output directory.

Sources: LLama/LLamaSharp.Runtime.targets1-142

Backend Package Structure

Backend-specific packages use .nuspec files and MSBuild .props files to manage library inclusion and platform-specific logic.

CPU Package: Includes all AVX variants for Windows and Linux LLama/runtimes/build/LLamaSharp.Backend.Cpu.nuspec21-74
CUDA Package: Includes ggml-cuda.dll/so and depends on the CPU package LLama/runtimes/build/LLamaSharp.Backend.Cuda12.Windows.nuspec17-30
Android Package: Maps libraries to specific Android ABIs (arm64-v8a, x86_64) using AndroidNativeLibrary items in the .props file LLama/runtimes/build/LLamaSharpBackend.props28-84
Backend Props: Handles RuntimeIdentifier detection to copy only relevant native assemblies LLama/runtimes/build/LLamaSharpBackend.props9-15

Sources: LLama/runtimes/build/LLamaSharp.Backend.Cpu.nuspec1-131 LLama/runtimes/build/LLamaSharp.Backend.Cuda12.Windows.nuspec1-35 LLama/runtimes/build/LLamaSharp.Backend.Cpu.Android.nuspec1-36 LLama/runtimes/build/LLamaSharpBackend.props1-87

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/8.1-native-binary-compilation