Last indexed: 18 May 2026 (ecd184)

Installation and Setup

This page guides you through installing LLamaSharp and its dependencies, selecting the appropriate backend for your hardware, and configuring your project for first use. For information about using LLamaSharp APIs after installation, see the Quick Start Guide (1.3). For details on the package architecture, see Package Architecture (1.1).

Prerequisites

LLamaSharp requires one of the following .NET target frameworks:

.NET Standard 2.0 or higher LLama/LLamaSharp.csproj3
.NET 8.0 or higher (recommended for best performance) LLama/LLamaSharp.csproj3

Hardware Considerations:

Minimum: x64 or ARM64 CPU LLama/LLamaSharp.csproj7
Recommended: GPU with CUDA 11/12 support or Vulkan support for accelerated inference README.md101-103
RAM: Varies by model size (4GB minimum for quantized 7B models, 16GB+ for larger models).

Sources: LLama/LLamaSharp.csproj3-7 README.md86-106

Package Installation Overview

LLamaSharp uses a modular package distribution strategy that separates the managed C# API from platform-specific native binaries. This design minimizes package size while supporting diverse hardware configurations.

Package Ecosystem Structure

Figure 1: LLamaSharp Package Dependencies

Sources: README.md5-10 LLama/LLamaSharp.csproj30 LLama.SemanticKernel/LLamaSharp.SemanticKernel.csproj31 LLama.KernelMemory/LLamaSharp.KernelMemory.csproj25

Step 1: Install Core Package

Install the LLamaSharp NuGet package, which contains the managed C# API:

Package Manager Console:

PM> Install-Package LLamaSharp

dotnet CLI:

.csproj Reference:

This package provides all managed types including LLamaWeights, LLamaContext, executors, and sampling APIs. The current version is 0.27.0, based on llama.cpp commit 3f7c29d318e317b63f54c558bc69803963d7d88c. LLama/LLamaSharp.csproj10-25

Sources: LLama/LLamaSharp.csproj10-30 README.md92-96

Step 2: Select and Install Backend Package

Backend packages contain compiled native libraries (.dll, .so, .dylib) for specific hardware configurations. You must install exactly one backend package that matches your target platform and acceleration requirements. README.md88-90

Backend Selection Matrix

Backend Package	Platforms	Hardware Acceleration	Use Case
`LLamaSharp.Backend.Cpu`	Windows x64/ARM64 Linux x64/ARM64 macOS x64/ARM64	CPU (AVX/AVX2/AVX512) Metal (macOS ARM64)	General CPU inference, macOS GPU
`LLamaSharp.Backend.Cuda11`	Windows x64 Linux x64	NVIDIA CUDA 11.x	NVIDIA GPUs with CUDA 11
`LLamaSharp.Backend.Cuda12`	Windows x64 Linux x64	NVIDIA CUDA 12.x	NVIDIA GPUs with CUDA 12
`LLamaSharp.Backend.Vulkan`	Windows x64 Linux x64	Vulkan	Cross-vendor GPU support

Sources: README.md98-103

Native Library Configuration

Automatic Selection and Loading

LLamaSharp manages native library loading via NativeLibraryConfig. By default, the library attempts to detect the best available backend based on system capabilities like AVX level and GPU availability. LLama/Native/Load/NativeLibraryConfig.cs10-15

Figure 2: Native Library Selection Logic

Sources: LLama/Native/Load/NativeLibraryUtils.cs15-36 LLama/Native/Load/NativeLibraryConfig.cs150-161

Manual Configuration

You can override the automatic selection by calling NativeLibraryConfig methods before any model loading: LLama/Native/Load/NativeLibraryConfig.cs10-15

Sources: LLama/Native/Load/NativeLibraryConfig.cs38-101

Manual Dependency Loading

Since llama.cpp binaries were split, LLamaSharp manually loads dependencies like ggml-base, ggml-cpu, and ggml-cuda to ensure compatibility across different runtime directories. LLama/Native/Load/NativeLibraryUtils.cs48-52

The loading sequence for dependencies is managed within NativeLibraryUtils.TryLoadLibrary: LLama/Native/Load/NativeLibraryUtils.cs15

ggml-base: Always loaded from the current runtime directory. LLama/Native/Load/NativeLibraryUtils.cs67
Platform Backends:
- macOS: Loads ggml-cpu, ggml-metal (ARM64), and ggml-blas. LLama/Native/Load/NativeLibraryUtils.cs72-85
- Windows/Linux: Loads ggml-cpu from the detected AVX level directory (e.g., avx2/ggml-cpu.dll). LLama/Native/Load/NativeLibraryUtils.cs98-102
- Accelerators: Loads ggml-cuda or ggml-vulkan if configured. LLama/Native/Load/NativeLibraryUtils.cs105-110
ggml: The main GGML entry point. LLama/Native/Load/NativeLibraryUtils.cs117