![]() |
VOOZH | about |
This page guides you through installing LLamaSharp and its dependencies, selecting the appropriate backend for your hardware, and configuring your project for first use. For information about using LLamaSharp APIs after installation, see the Quick Start Guide (1.3). For details on the package architecture, see Package Architecture (1.1).
LLamaSharp requires one of the following .NET target frameworks:
Hardware Considerations:
Sources: LLama/LLamaSharp.csproj3-7 README.md86-106
LLamaSharp uses a modular package distribution strategy that separates the managed C# API from platform-specific native binaries. This design minimizes package size while supporting diverse hardware configurations.
Figure 1: LLamaSharp Package Dependencies
Sources: README.md5-10 LLama/LLamaSharp.csproj30 LLama.SemanticKernel/LLamaSharp.SemanticKernel.csproj31 LLama.KernelMemory/LLamaSharp.KernelMemory.csproj25
Install the LLamaSharp NuGet package, which contains the managed C# API:
Package Manager Console:
PM> Install-Package LLamaSharp
dotnet CLI:
.csproj Reference:
This package provides all managed types including LLamaWeights, LLamaContext, executors, and sampling APIs. The current version is 0.27.0, based on llama.cpp commit 3f7c29d318e317b63f54c558bc69803963d7d88c. LLama/LLamaSharp.csproj10-25
Sources: LLama/LLamaSharp.csproj10-30 README.md92-96
Backend packages contain compiled native libraries (.dll, .so, .dylib) for specific hardware configurations. You must install exactly one backend package that matches your target platform and acceleration requirements. README.md88-90
| Backend Package | Platforms | Hardware Acceleration | Use Case |
|---|---|---|---|
LLamaSharp.Backend.Cpu | Windows x64/ARM64 Linux x64/ARM64 macOS x64/ARM64 | CPU (AVX/AVX2/AVX512) Metal (macOS ARM64) | General CPU inference, macOS GPU |
LLamaSharp.Backend.Cuda11 | Windows x64 Linux x64 | NVIDIA CUDA 11.x | NVIDIA GPUs with CUDA 11 |
LLamaSharp.Backend.Cuda12 | Windows x64 Linux x64 | NVIDIA CUDA 12.x | NVIDIA GPUs with CUDA 12 |
LLamaSharp.Backend.Vulkan | Windows x64 Linux x64 | Vulkan | Cross-vendor GPU support |
Sources: README.md98-103
LLamaSharp manages native library loading via NativeLibraryConfig. By default, the library attempts to detect the best available backend based on system capabilities like AVX level and GPU availability. LLama/Native/Load/NativeLibraryConfig.cs10-15
Figure 2: Native Library Selection Logic
Sources: LLama/Native/Load/NativeLibraryUtils.cs15-36 LLama/Native/Load/NativeLibraryConfig.cs150-161
You can override the automatic selection by calling NativeLibraryConfig methods before any model loading: LLama/Native/Load/NativeLibraryConfig.cs10-15
Sources: LLama/Native/Load/NativeLibraryConfig.cs38-101
Since llama.cpp binaries were split, LLamaSharp manually loads dependencies like ggml-base, ggml-cpu, and ggml-cuda to ensure compatibility across different runtime directories. LLama/Native/Load/NativeLibraryUtils.cs48-52
The loading sequence for dependencies is managed within NativeLibraryUtils.TryLoadLibrary: LLama/Native/Load/NativeLibraryUtils.cs15
ggml-cpu, ggml-metal (ARM64), and ggml-blas. LLama/Native/Load/NativeLibraryUtils.cs72-85ggml-cpu from the detected AVX level directory (e.g., avx2/ggml-cpu.dll). LLama/Native/Load/NativeLibraryUtils.cs98-102ggml-cuda or ggml-vulkan if configured. LLama/Native/Load/NativeLibraryUtils.cs105-110Figure 3: Native Dependency Loading Order
Sources: LLama/Native/Load/NativeLibraryUtils.cs63-123
For applications using Microsoft Semantic Kernel, install the integration package:
This package targets netstandard2.0 and net8.0. LLama.SemanticKernel/LLamaSharp.SemanticKernel.csproj4
Sources: LLama.SemanticKernel/LLamaSharp.SemanticKernel.csproj1-31
For RAG (Retrieval Augmented Generation) scenarios using Microsoft Kernel Memory:
This package targets net8.0. LLama.KernelMemory/LLamaSharp.KernelMemory.csproj4
Sources: LLama.KernelMemory/LLamaSharp.KernelMemory.csproj1-25
LLamaSharp requires models in GGUF format. README.md110
Search Hugging Face for {model-name} gguf. Popular models like Llama-3.2 and Mistral are widely available in this format.
A typical project should enable AllowUnsafeBlocks to support high-performance native interop. LLama/LLamaSharp.csproj8
Example Configuration in .csproj:
Sources: LLama/LLamaSharp.csproj1-33
Refresh this wiki