Last indexed: 18 May 2026 (ecd184)

Core API Classes

This document describes the core API classes that provide the foundational layer for model operations in LLamaSharp. These classes directly wrap native llama.cpp functionality and provide the building blocks used by higher-level abstractions like executors and chat sessions.

Overview

The core API consists of primary managed classes and their configuration interfaces:

LLamaWeights: Manages loaded model weights and vocabulary.
LLamaContext: Provides inference capabilities and context management.
ModelParams: Configuration for model loading.
IContextParams: Interface for context creation parameters.
LLamaEmbedder: High-level utility for generating text embeddings.
LLamaReranker: Utility for computing relevance scores between queries and documents.
LLamaQuantizer: Static utility for model quantization.

These classes wrap native handles (SafeLlamaModelHandle, SafeLLamaContextHandle) and provide managed, type-safe access to llama.cpp functionality.

Core API Class Relationships

Sources: LLama/LLamaWeights.cs17-18 LLama/LLamaContext.cs18-42 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/LLamaEmbedder.cs15-26 LLama/LLamaReranker.cs15-26

LLamaWeights

LLamaWeights represents a set of model weights loaded into memory. It is the prerequisite for creating an inference context.

Properties

Property	Type	Description
`NativeHandle`	`SafeLlamaModelHandle`	The underlying native handle LLama/LLamaWeights.cs24
`ContextSize`	`int`	Number of tokens the model was trained for LLama/LLamaWeights.cs29
`SizeInBytes`	`ulong`	Size of the model in bytes LLama/LLamaWeights.cs34
`ParameterCount`	`ulong`	Total number of parameters LLama/LLamaWeights.cs39
`EmbeddingSize`	`int`	Dimension of embedding vectors LLama/LLamaWeights.cs44
`Vocab`	`SafeLlamaModelHandle.Vocabulary`	Vocabulary and special tokens LLama/LLamaWeights.cs49
`Metadata`	`IReadOnlyDictionary<string, string>`	Model metadata key-value pairs LLama/LLamaWeights.cs54

Key Methods

Loading Models

Weights are loaded using IModelParams. The asynchronous version supports progress reporting and cancellation via a native callback wrapper LLama/LLamaWeights.cs95-111

Creating Contexts

A weight instance can produce multiple contexts, each with its own KV cache and state.

Sources: LLama/LLamaWeights.cs17-170 LLama/Native/SafeLlamaModelHandle.cs136-151

LLamaContext

LLamaContext holds the state required for interaction with a model. It wraps SafeLLamaContextHandle and provides methods for tokenization, decoding, and state management.

Key Features

Thread Management: Allows getting or setting threads for generation and batch processing LLama/LLamaContext.cs52-65
Tokenization: Managed wrappers for llama_tokenize via the native handle LLama/LLamaContext.cs107-110
Vocabulary Access: Provides access to special tokens and vocabulary metadata LLama/LLamaContext.cs75

State Management

Context state can be saved to or loaded from files. The implementation uses MemoryMappedFile to write bytes directly from the native pointer to disk LLama/LLamaContext.cs144-160

Sources: LLama/LLamaContext.cs18-126 LLama/Native/SafeLLamaContextHandle.cs13-68

LLamaEmbedder and LLamaReranker

These classes provide specialized pipelines for non-generative tasks.

LLamaEmbedder

Used for generating high-dimensional vectors. It automatically configures the context for embedding mode using llama_set_embeddings LLama/LLamaEmbedder.cs79

Normalization: Applies Euclidean normalization to results LLama/LLamaEmbedder.cs147
Model Support: Throws NotSupportedException for encoder-decoder models LLama/LLamaEmbedder.cs42-43

LLamaReranker

Computes relevance scores between a query and multiple documents.

Requirement: Requires PoolingType to be set to LLamaPoolingType.Rank LLama/LLamaReranker.cs40-41
Logic: Concatenates input and document tokens before evaluating LLama/LLamaReranker.cs71-72

Sources: LLama/LLamaEmbedder.cs15-154 LLama/LLamaReranker.cs15-181

Model Quantization

LLamaQuantizer provides a static interface to the native llama_model_quantize API LLama/LLamaQuantizer.cs10-11

It supports a "relaxed" string-to-enum parsing for LLamaFtype, allowing partial matches like "Q5_K_M" LLama/LLamaQuantizer.cs126-151

Sources: LLama/LLamaQuantizer.cs10-153 LLama/Native/NativeApi.Quantize.cs10-15

Fundamental Types

Type	Implementation	Description
`LLamaToken`	`struct`	Integer ID representing a piece of text LLama/Native/NativeApi.cs109
`LLamaSeqId`	`struct`	Identifier for a sequence within the KV cache LLama/Native/LLamaSeqId.cs10
`LLamaBatch`	`class`	Managed structure for submitting multiple tokens for evaluation LLama/Native/LLamaNativeBatch.cs10

Data Flow: Text to Inference

Sources: LLama/LLamaContext.cs107-110 LLama/Native/SafeLLamaContextHandle.cs180-185 LLama/Native/LLamaSeqId.cs10

Native Handle Management

LLamaSharp uses a hierarchy of SafeHandle types to prevent memory leaks and use-after-free errors.

SafeLLamaHandleBase: Base class for all native handles LLama/Native/SafeLlamaModelHandle.cs16
Reference Counting: SafeLLamaContextHandle increments the reference count of its parent SafeLlamaModelHandle during creation LLama/Native/SafeLLamaContextHandle.cs119 and decrements it upon disposal LLama/Native/SafeLLamaContextHandle.cs86
Initialization: Static constructors ensure NativeApi.llama_empty_call() is executed to force library loading before handle creation LLama/Native/SafeLlamaModelHandle.cs168-172 LLama/Native/SafeLLamaContextHandle.cs126-130

Sources: LLama/Native/SafeLlamaModelHandle.cs15-127 LLama/Native/SafeLLamaContextHandle.cs79-122

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.1-core-api-classes

⇱ Core API Classes | SciSharp/LLamaSharp | DeepWiki