VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/2-core-architecture

⇱ Core Architecture | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Core Architecture

This document describes the fundamental three-tier architecture of LLamaSharp, which provides a safe, managed C# wrapper around the native llama.cpp library. Understanding this architecture is essential for working with any part of the LLamaSharp codebase.

Scope: This page covers the layered architecture that bridges managed C# code to native llama.cpp through SafeHandles and P/Invoke. For component-specific details:


Architecture Overview

LLamaSharp implements a strict layered architecture to safely bridge managed C# to native llama.cpp:

LayerKey ClassesResponsibility
User-Facing APILLamaWeights, LLamaContext, LLamaEmbedderIdiomatic C# interfaces, parameter validation
SafeHandle LayerSafeLlamaModelHandle, SafeLLamaContextHandleAutomatic resource cleanup, reference counting
P/Invoke LayerNativeApi, LLamaModelParams, LLamaContextParamsNative method declarations, struct marshalling
Native Librarylibllama.so/dll/dylib, libggml-*.so/dll/dylibllama.cpp C API, ggml tensor operations

This separation ensures resource safety (SafeHandles prevent leaks), type safety (managed wrappers validate parameters), and platform abstraction (P/Invoke handles ABI details).

Layered Architecture Call Flow


Sources: LLama/LLamaContext.cs18-98 LLama/Native/SafeLlamaModelHandle.cs15-186 LLama/Native/SafeLLamaContextHandle.cs13-139 LLama/Native/NativeApi.cs11-147


Tier 1: User-Facing API Layer

The user-facing API consists of high-level managed classes that developers use to interact with LLamaSharp. These classes provide idiomatic C# interfaces, hide complexity, and enforce correct usage patterns.

Key Classes

ClassPurposeLifecycle
LLamaWeightsRepresents a loaded GGUF model file in memoryCreated via LoadFromFile() LLama/LLamaWeights.cs67 disposed to free memory
LLamaContextManages inference state for a model, including KV cacheCreated from LLamaWeights LLama/LLamaContext.cs84 disposed to free context

Design Principles

Encapsulation: User-facing classes expose only SafeHandles as public properties (LLamaContext.NativeHandle LLama/LLamaContext.cs42), not raw pointers. Properties like ContextSize LLama/LLamaContext.cs26 and EmbeddingSize LLama/LLamaContext.cs31 delegate to SafeHandle properties.

Disposal Pattern: All user-facing classes implement IDisposable. LLamaContext disposes its SafeLLamaContextHandle LLama/LLamaContext.cs19 which triggers the underlying native handle release LLama/Native/SafeLLamaContextHandle.cs80-90

Encoding Abstraction: LLamaContext stores an Encoding field LLama/LLamaContext.cs47 initialized from IContextParams.Encoding LLama/LLamaContext.cs92 used for all tokenization operations LLama/LLamaContext.cs109

Sources: LLama/LLamaContext.cs18-98 LLama/LLamaWeights.cs17-155 LLama/Native/SafeLlamaModelHandle.cs123-127 LLama/Native/SafeLLamaContextHandle.cs80-90


Tier 2: Safe Handle Layer

The Safe Handle Layer wraps native pointers in classes derived from SafeHandle. This ensures that native resources are properly released even if exceptions occur, preventing memory leaks and use-after-free bugs.

SafeHandle Implementation Pattern

All native resource wrappers inherit from SafeLLamaHandleBase. The ReleaseHandle() override is called automatically by the garbage collector finalizer or explicit Dispose().

SafeHandle Hierarchy and Cleanup Methods


Sources: LLama/Native/SafeLlamaModelHandle.cs15-128 LLama/Native/SafeLLamaContextHandle.cs13-146

Reference Counting

SafeLLamaContextHandle holds a reference-counted link to its parent SafeLlamaModelHandle via the _model field LLama/Native/SafeLLamaContextHandle.cs75 This prevents the model from being freed while contexts exist.

Reference Count Lifecycle


The ThrowIfDisposed() method LLama/Native/SafeLLamaContextHandle.cs92-100 validates both handles before any operation.

Sources: LLama/Native/SafeLLamaContextHandle.cs75-122


Tier 3: Native Interop Layer

The Native Interop Layer contains P/Invoke declarations that interface directly with the llama.cpp C API.

NativeApi Class Structure

NativeApi LLama/Native/NativeApi.cs11 is a public static partial class. All P/Invoke methods use [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)].

Method Organization by Category

CategoryMethodsLinesCalled By
Backend lifecyclellama_backend_free(), llama_backend_init()26, 87Application start/end
Device queriesllama_max_devices(), llama_supports_mmap()34, 52Environment validation
Model operationsllama_model_load_from_file(), llama_model_free()186, 125SafeLlamaModelHandle
State I/Ollama_state_load_file(), llama_state_save_file()109, 121LLamaContext.SaveState()

Sources: LLama/Native/NativeApi.cs26-121 LLama/Native/SafeLlamaModelHandle.cs186 LLama/Native/SafeLLamaContextHandle.cs146


Native Library Loading

LLamaSharp separates managed code from native binaries. Users must provide a native binary (e.g., through a backend NuGet package) matching their hardware.

Loading Mechanism

Initialization: Static constructors in SafeHandles ensure the library is loaded by calling NativeApi.llama_empty_call() LLama/Native/SafeLLamaContextHandle.cs129 which triggers the internal load by calling llama_max_devices() LLama/Native/NativeApi.cs17-20

Sources: LLama/Native/SafeLLamaContextHandle.cs126-130 LLama/Native/NativeApi.cs17-20 LLama/Native/SafeLlamaModelHandle.cs168-172


Summary

The three-tier architecture provides a robust foundation for LLamaSharp:

  • Tier 1 delivers a high-level, user-friendly API (LLamaWeights, LLamaContext)
  • Tier 2 ensures safe resource management through SafeHandle and reference counting
  • Tier 3 handles low-level native interop with NativeApi

For detailed information about specific components, refer to the child pages: Native Library Integration, Model Loading and LLamaWeights, Context Management and LLamaContext, Memory Management and SafeHandles, and Tokenization and Vocabulary.