VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/2.2-model-loading-and-llamaweights

⇱ Model Loading and LLamaWeights | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Model Loading and LLamaWeights

This document explains the LLamaWeights class, which represents a loaded LLM model in memory. LLamaWeights is the entry point for working with model files in GGUF format and provides methods for loading models, accessing model metadata, creating inference contexts, and tokenizing text.

Scope: This page covers model loading and the LLamaWeights API. For context creation and inference state management, see page 2.3. For low-level native interop details, see page 2.1. For safe handle resource management, see page 2.4. For tokenization details, see page 2.5.


Overview

LLamaWeights is the high-level managed wrapper around a loaded GGUF model file. It encapsulates a SafeLlamaModelHandle, which wraps the native llama_model* pointer from llama.cpp. The class provides:

LLamaWeights Architecture and Flow

Title: LLamaWeights Architecture and Flow


Sources: LLama/LLamaWeights.cs17-60 LLama/Native/SafeLlamaModelHandle.cs15-121 LLama/Native/NativeApi.cs186-187


Loading Models

Synchronous Loading

The primary method for loading a model is LoadFromFile, which accepts an IModelParams object specifying the model path and configuration:


The loading process:

  1. Convert IModelParams to native LLamaModelParams structure via ToLlamaModelParams() LLama/LLamaWeights.cs69
  2. Call SafeLlamaModelHandle.LoadFromFile() with model path and parameters LLama/Native/SafeLlamaModelHandle.cs136-151
  3. Native llama_model_load_from_file() loads the GGUF file and allocates memory LLama/Native/SafeLlamaModelHandle.cs186
  4. Wrap the returned SafeLlamaModelHandle in a LLamaWeights instance LLama/LLamaWeights.cs71
  5. Private constructor reads and caches model metadata via ReadMetadata() LLama/LLamaWeights.cs59

Sources: LLama/LLamaWeights.cs56-72 LLama/Native/SafeLlamaModelHandle.cs136-151


Asynchronous Loading with Progress Reporting

For long-running model loads, LoadFromFileAsync provides cancellation support and progress reporting:


This method:

Asynchronous Loading Flow

Title: Asynchronous Loading Flow


Sources: LLama/LLamaWeights.cs83-138 LLama/Native/SafeLlamaModelHandle.cs136-151


Error Handling

Model loading can fail for several reasons:

Exception TypeCause
FileNotFoundExceptionModel file does not exist (handled by FileStream check) LLama/Native/SafeLlamaModelHandle.cs142
InvalidOperationExceptionFile is not readable LLama/Native/SafeLlamaModelHandle.cs144
LoadWeightsFailedExceptionNative loading returned an invalid handle LLama/Native/SafeLlamaModelHandle.cs148
OperationCanceledExceptionLoading cancelled via CancellationToken LLama/LLamaWeights.cs129

Sources: LLama/Native/SafeLlamaModelHandle.cs136-151 LLama/LLamaWeights.cs127-133


Model Properties

LLamaWeights exposes read-only properties that describe the loaded model:

PropertyTypeNative API CalledDescription
NativeHandleSafeLlamaModelHandleN/AThe underlying safe handle wrapping the native llama_model* LLama/LLamaWeights.cs24
ContextSizeintllama_model_n_ctx_trainTraining context size LLama/Native/SafeLlamaModelHandle.cs26
SizeInBytesulongllama_model_sizeTotal size of model weights in bytes LLama/Native/SafeLlamaModelHandle.cs41
ParameterCountulongllama_model_n_paramsNumber of model parameters LLama/Native/SafeLlamaModelHandle.cs46
EmbeddingSizeintllama_model_n_embdDimension of embedding vectors LLama/Native/SafeLlamaModelHandle.cs36
VocabVocabulary_vocab fieldVocabulary access LLama/Native/SafeLlamaModelHandle.cs120
MetadataIReadOnlyDictionaryReadMetadata()Cached key-value metadata LLama/LLamaWeights.cs54

Property Delegation Chain

Title: Property Delegation Chain


Sources: LLama/LLamaWeights.cs24-54 LLama/Native/SafeLlamaModelHandle.cs18-120


Model Metadata and Templates

The Metadata property provides access to all key-value pairs embedded in the GGUF file. Metadata is read once during construction via weights.ReadMetadata() LLama/LLamaWeights.cs59

Chat Templates

LLamaWeights also provides access to model chat templates. These can be retrieved via the underlying SafeLlamaModelHandle using llama_model_chat_template LLama/Native/SafeLlamaModelHandle.cs175

Metadata and Template Flow

Title: Metadata and Template Flow


Sources: LLama/LLamaWeights.cs54-60 LLama/Native/SafeLlamaModelHandle.cs113 LLama/Native/SafeLlamaModelHandle.cs175


Creating Contexts

LLamaWeights creates LLamaContext instances, which manage inference state:


When a context is created, it maintains a reference to the model weights. The SafeLLamaContextHandle increments the reference count of the SafeLlamaModelHandle to ensure the model isn't freed while contexts are active LLama/Native/SafeLLamaContextHandle.cs117-119

Sources: LLama/LLamaWeights.cs152-155 LLama/LLamaContext.cs84-98 LLama/Native/SafeLLamaContextHandle.cs109-122


Tokenization

LLamaWeights provides a high-level tokenization method:


This delegates to the underlying SafeLlamaModelHandle.Tokenize, which interfaces with the native vocabulary LLama/LLamaWeights.cs167

Sources: LLama/LLamaWeights.cs165-168 LLama/LLamaContext.cs107-110


Model Quantization

LLamaSharp supports quantizing model files to different formats (e.g., Q4_K_M, Q8_0) via the LLamaQuantizer class LLama/LLamaQuantizer.cs10-11 This process uses LLamaModelQuantizeParams to configure the quantization operation LLama/LLamaQuantizer.cs32-36

FeatureMethodSource
Quantize FileLLamaQuantizer.Quantize(...)LLama/LLamaQuantizer.cs23-43
Supported TypesLLamaFtype enumLLama/Native/LLamaFtype.cs7-214
Params StructLLamaModelQuantizeParamsLLama/Native/LLamaModelQuantizeParams.cs10-113

Quantization Process

Title: Quantization Process


Sources: LLama/LLamaQuantizer.cs10-43 LLama/Native/LLamaModelQuantizeParams.cs9-113 LLama/Native/LLamaFtype.cs7-214 LLama/Native/NativeApi.Quantize.cs12-13


LoRA Adapters

LLamaSharp supports applying LoRA (Low-Rank Adaptation) adapters to models via the LoraAdapter class LLama/Native/LoraAdapter.cs8-9 Adapters are loaded for a specific SafeLlamaModelHandle and can be manually freed or automatically cleaned up when the model is unloaded LLama/Native/LoraAdapter.cs41-62

Sources: LLama/Native/LoraAdapter.cs8-63


Resource Management

LLamaWeights implements IDisposable to release native resources:


The disposal chain:

  1. LLamaWeights.Dispose() is called LLama/LLamaWeights.cs141
  2. NativeHandle.Dispose() triggers the safe handle cleanup LLama/LLamaWeights.cs143
  3. SafeLlamaModelHandle.ReleaseHandle() calls native llama_model_free(handle) LLama/Native/SafeLlamaModelHandle.cs123-127

Sources: LLama/LLamaWeights.cs141-144 LLama/Native/SafeLlamaModelHandle.cs123-127