Last indexed: 18 May 2026 (ecd184)

Core Architecture

This document describes the fundamental three-tier architecture of LLamaSharp, which provides a safe, managed C# wrapper around the native llama.cpp library. Understanding this architecture is essential for working with any part of the LLamaSharp codebase.

Scope: This page covers the layered architecture that bridges managed C# code to native llama.cpp through SafeHandles and P/Invoke. For component-specific details:

Native library loading and backend binary management → Native Library Integration
LLamaWeights lifecycle and GGUF model loading → Model Loading and LLamaWeights
LLamaContext inference operations and KV cache → Context Management and LLamaContext
SafeHandle patterns and reference counting → Memory Management and SafeHandles
Tokenization APIs and StreamingTokenDecoder → Tokenization and Vocabulary

Architecture Overview

LLamaSharp implements a strict layered architecture to safely bridge managed C# to native llama.cpp:

Layer	Key Classes	Responsibility
User-Facing API	`LLamaWeights`, `LLamaContext`, `LLamaEmbedder`	Idiomatic C# interfaces, parameter validation
SafeHandle Layer	`SafeLlamaModelHandle`, `SafeLLamaContextHandle`	Automatic resource cleanup, reference counting
P/Invoke Layer	`NativeApi`, `LLamaModelParams`, `LLamaContextParams`	Native method declarations, struct marshalling
Native Library	`libllama.so/dll/dylib`, `libggml-*.so/dll/dylib`	llama.cpp C API, ggml tensor operations

This separation ensures resource safety (SafeHandles prevent leaks), type safety (managed wrappers validate parameters), and platform abstraction (P/Invoke handles ABI details).

Layered Architecture Call Flow

Sources: LLama/LLamaContext.cs18-98 LLama/Native/SafeLlamaModelHandle.cs15-186 LLama/Native/SafeLLamaContextHandle.cs13-139 LLama/Native/NativeApi.cs11-147

Tier 1: User-Facing API Layer

The user-facing API consists of high-level managed classes that developers use to interact with LLamaSharp. These classes provide idiomatic C# interfaces, hide complexity, and enforce correct usage patterns.

Key Classes

Class	Purpose	Lifecycle
`LLamaWeights`	Represents a loaded GGUF model file in memory	Created via `LoadFromFile()` LLama/LLamaWeights.cs67 disposed to free memory
`LLamaContext`	Manages inference state for a model, including KV cache	Created from `LLamaWeights` LLama/LLamaContext.cs84 disposed to free context

Design Principles

Encapsulation: User-facing classes expose only SafeHandles as public properties (LLamaContext.NativeHandle LLama/LLamaContext.cs42), not raw pointers. Properties like ContextSize LLama/LLamaContext.cs26 and EmbeddingSize LLama/LLamaContext.cs31 delegate to SafeHandle properties.

Disposal Pattern: All user-facing classes implement IDisposable. LLamaContext disposes its SafeLLamaContextHandle LLama/LLamaContext.cs19 which triggers the underlying native handle release LLama/Native/SafeLLamaContextHandle.cs80-90

Encoding Abstraction: LLamaContext stores an Encoding field LLama/LLamaContext.cs47 initialized from IContextParams.Encoding LLama/LLamaContext.cs92 used for all tokenization operations LLama/LLamaContext.cs109

Sources: LLama/LLamaContext.cs18-98 LLama/LLamaWeights.cs17-155 LLama/Native/SafeLlamaModelHandle.cs123-127 LLama/Native/SafeLLamaContextHandle.cs80-90

Tier 2: Safe Handle Layer

The Safe Handle Layer wraps native pointers in classes derived from SafeHandle. This ensures that native resources are properly released even if exceptions occur, preventing memory leaks and use-after-free bugs.

SafeHandle Implementation Pattern

All native resource wrappers inherit from SafeLLamaHandleBase. The ReleaseHandle() override is called automatically by the garbage collector finalizer or explicit Dispose().

SafeHandle Hierarchy and Cleanup Methods

Sources: LLama/Native/SafeLlamaModelHandle.cs15-128 LLama/Native/SafeLLamaContextHandle.cs13-146

Reference Counting

SafeLLamaContextHandle holds a reference-counted link to its parent SafeLlamaModelHandle via the _model field LLama/Native/SafeLLamaContextHandle.cs75 This prevents the model from being freed while contexts exist.

Reference Count Lifecycle

The ThrowIfDisposed() method LLama/Native/SafeLLamaContextHandle.cs92-100 validates both handles before any operation.

Sources: LLama/Native/SafeLLamaContextHandle.cs75-122

Tier 3: Native Interop Layer

The Native Interop Layer contains P/Invoke declarations that interface directly with the llama.cpp C API.

NativeApi Class Structure

NativeApi LLama/Native/NativeApi.cs11 is a public static partial class. All P/Invoke methods use [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)].

Method Organization by Category

Category	Methods	Lines	Called By
Backend lifecycle	`llama_backend_free()`, `llama_backend_init()`	26, 87	Application start/end
Device queries	`llama_max_devices()`, `llama_supports_mmap()`	34, 52	Environment validation
Model operations	`llama_model_load_from_file()`, `llama_model_free()`	186, 125	`SafeLlamaModelHandle`
State I/O	`llama_state_load_file()`, `llama_state_save_file()`	109, 121	`LLamaContext.SaveState()`

Sources: LLama/Native/NativeApi.cs26-121 LLama/Native/SafeLlamaModelHandle.cs186 LLama/Native/SafeLLamaContextHandle.cs146

Native Library Loading

LLamaSharp separates managed code from native binaries. Users must provide a native binary (e.g., through a backend NuGet package) matching their hardware.

Loading Mechanism

Initialization: Static constructors in SafeHandles ensure the library is loaded by calling NativeApi.llama_empty_call() LLama/Native/SafeLLamaContextHandle.cs129 which triggers the internal load by calling llama_max_devices() LLama/Native/NativeApi.cs17-20

Sources: LLama/Native/SafeLLamaContextHandle.cs126-130 LLama/Native/NativeApi.cs17-20 LLama/Native/SafeLlamaModelHandle.cs168-172

Summary

The three-tier architecture provides a robust foundation for LLamaSharp:

Tier 1 delivers a high-level, user-friendly API (LLamaWeights, LLamaContext)
Tier 2 ensures safe resource management through SafeHandle and reference counting
Tier 3 handles low-level native interop with NativeApi

For detailed information about specific components, refer to the child pages: Native Library Integration, Model Loading and LLamaWeights, Context Management and LLamaContext, Memory Management and SafeHandles, and Tokenization and Vocabulary.

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/2-core-architecture