VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/2.4-memory-management-and-safehandles

⇱ Memory Management and SafeHandles | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Memory Management and SafeHandles

This page documents LLamaSharp's memory management architecture, focusing on the SafeHandle pattern used to manage unmanaged resources from the native llama.cpp library. It covers resource lifetime management, reference counting, memory pinning, and the ownership hierarchy that prevents use-after-free errors.

Overview

LLamaSharp manages unmanaged memory allocated by llama.cpp using .NET's SafeHandle infrastructure. This ensures deterministic cleanup of native resources (model weights, inference contexts, samplers) even in the presence of exceptions or garbage collection. The architecture implements manual reference counting to prevent premature disposal when multiple managed objects share a single native resource.

Key responsibilities of the memory management system:


SafeHandle Hierarchy

The following diagram maps the high-level managed classes to their corresponding internal SafeHandle implementations and the native llama.cpp entities they represent.

Code Entity Mapping: Managed to Native


Sources: LLama/LLamaWeights.cs17-24 LLama/LLamaContext.cs18-42 LLama/Native/SafeLlamaModelHandle.cs15-17 LLama/Native/SafeLLamaContextHandle.cs13-15 LLama/Native/SafeLLamaSamplerHandle.cs13-15

All handles inherit from SafeLLamaHandleBase, which derives from System.Runtime.InteropServices.SafeHandle. It sets ownsHandle: true to ensure the .NET runtime invokes the finalizer if the object is not disposed manually. LLama/Native/SafeLLamaHandleBase.cs8-20


SafeHandle Reference Counting

To prevent a parent resource (like model weights) from being freed while a child resource (like an inference context) is still active, LLamaSharp uses manual reference counting via DangerousAddRef.

Context-Model Relationship

When a SafeLLamaContextHandle is created via llama_init_from_model, it increments the reference count of the SafeLlamaModelHandle. This ensures the model weights remain in memory as long as the context exists. When the context handle is released, it calls DangerousRelease on the associated model. LLama/Native/SafeLLamaContextHandle.cs79-122


Sources: LLama/Native/SafeLLamaContextHandle.cs79-122 LLama/Native/SafeLlamaModelHandle.cs122-127


Memory Pinning and Native Interop

LLamaSharp utilizes memory pinning to facilitate high-performance, zero-copy transfers between managed memory and native buffers. This is critical for model loading and state persistence.

Data Flow: Model Parameter Conversion

When loading a model, managed IModelParams must be converted to a native LLamaModelParams struct. This requires pinning arrays (like TensorSplits) so the native code can access them via pointers without the GC moving them. LLamaSharp uses GroupDisposable to track these pins and release them once the native call is complete. LLama/LLamaWeights.cs67-69 LLama/Extensions/IModelParamsExtensions.cs23-48


Sources: LLama/LLamaWeights.cs67-71 LLama/Extensions/IModelParamsExtensions.cs30-103 LLama/Native/SafeLlamaModelHandle.cs136-151

State Serialization (Zero-Copy)

In LLamaContext.SaveState, the library uses a MemoryMappedFile to provide a raw pointer (byte*) directly to the native GetState function. This avoids allocating a large managed byte array and copying data multiple times. LLama/LLamaContext.cs140-163

OperationCode EntityImplementationSource
Pointer AcquisitionSafeMemoryMappedViewHandleAcquirePointer(ref ptr)LLama/LLamaContext.cs149-150
Native CallSafeLLamaContextHandleGetState(ptr, stateSize)LLama/LLamaContext.cs153
CleanupSafeMemoryMappedViewHandleReleasePointer()LLama/LLamaContext.cs157

Resource Disposal Patterns

Deterministic Cleanup

Managed classes like LLamaWeights and LLamaContext implement IDisposable. Their Dispose method propagates the call to the underlying SafeHandle. LLama/LLamaWeights.cs141-144 LLama/LLamaContext.cs337-340

Native Memory Safety

The SafeLLamaHandleBase ensures that even if a developer forgets to call Dispose(), the native memory will eventually be freed by the finalizer thread calling ReleaseHandle. LLama/Native/SafeLLamaHandleBase.cs8-27

Sampler Management

SafeLLamaSamplerChainHandle manages a collection of native samplers. When the chain handle is disposed, it calls llama_sampler_free, which cleans up all samplers contained within that chain. LLama/Native/SafeLLamaSamplerHandle.cs31-36


Thread Safety and Global State

While memory management handles resource lifetimes, llama.cpp often requires serialized access to certain backend functions.

  • Initialization: NativeApi.llama_backend_init() is called automatically to initialize the GGML backend. This is private to ensure it is only called once by the library internals. LLama/Native/NativeApi.cs81-87
  • Handle Validation: The ThrowIfDisposed() method in SafeLLamaContextHandle prevents calling native functions with a handle that has already been closed or whose parent model has been disposed, avoiding low-level access violations. LLama/Native/SafeLLamaContextHandle.cs92-100
  • Empty Calls: NativeApi.llama_empty_call() is used in static constructors of handles (e.g., SafeLlamaModelHandle and SafeLLamaContextHandle) to force the runtime to load native dependencies and initialize the backend before any instance methods are called. LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Sources: LLama/Native/NativeApi.cs17-20 LLama/Native/SafeLLamaContextHandle.cs92-100 LLama/Native/SafeLlamaModelHandle.cs168-172