VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.1-core-api-classes

⇱ Core API Classes | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Core API Classes

This document describes the core API classes that provide the foundational layer for model operations in LLamaSharp. These classes directly wrap native llama.cpp functionality and provide the building blocks used by higher-level abstractions like executors and chat sessions.


Overview

The core API consists of primary managed classes and their configuration interfaces:

  • LLamaWeights: Manages loaded model weights and vocabulary.
  • LLamaContext: Provides inference capabilities and context management.
  • ModelParams: Configuration for model loading.
  • IContextParams: Interface for context creation parameters.
  • LLamaEmbedder: High-level utility for generating text embeddings.
  • LLamaReranker: Utility for computing relevance scores between queries and documents.
  • LLamaQuantizer: Static utility for model quantization.

These classes wrap native handles (SafeLlamaModelHandle, SafeLLamaContextHandle) and provide managed, type-safe access to llama.cpp functionality.

Core API Class Relationships


Sources: LLama/LLamaWeights.cs17-18 LLama/LLamaContext.cs18-42 LLama/Native/SafeLlamaModelHandle.cs15-16 LLama/Native/SafeLLamaContextHandle.cs13-14 LLama/LLamaEmbedder.cs15-26 LLama/LLamaReranker.cs15-26


LLamaWeights

LLamaWeights represents a set of model weights loaded into memory. It is the prerequisite for creating an inference context.

Properties

PropertyTypeDescription
NativeHandleSafeLlamaModelHandleThe underlying native handle LLama/LLamaWeights.cs24
ContextSizeintNumber of tokens the model was trained for LLama/LLamaWeights.cs29
SizeInBytesulongSize of the model in bytes LLama/LLamaWeights.cs34
ParameterCountulongTotal number of parameters LLama/LLamaWeights.cs39
EmbeddingSizeintDimension of embedding vectors LLama/LLamaWeights.cs44
VocabSafeLlamaModelHandle.VocabularyVocabulary and special tokens LLama/LLamaWeights.cs49
MetadataIReadOnlyDictionary<string, string>Model metadata key-value pairs LLama/LLamaWeights.cs54

Key Methods

Loading Models

Weights are loaded using IModelParams. The asynchronous version supports progress reporting and cancellation via a native callback wrapper LLama/LLamaWeights.cs95-111


Creating Contexts

A weight instance can produce multiple contexts, each with its own KV cache and state.


Sources: LLama/LLamaWeights.cs17-170 LLama/Native/SafeLlamaModelHandle.cs136-151


LLamaContext

LLamaContext holds the state required for interaction with a model. It wraps SafeLLamaContextHandle and provides methods for tokenization, decoding, and state management.

Key Features

State Management

Context state can be saved to or loaded from files. The implementation uses MemoryMappedFile to write bytes directly from the native pointer to disk LLama/LLamaContext.cs144-160


Sources: LLama/LLamaContext.cs18-126 LLama/Native/SafeLLamaContextHandle.cs13-68


LLamaEmbedder and LLamaReranker

These classes provide specialized pipelines for non-generative tasks.

LLamaEmbedder

Used for generating high-dimensional vectors. It automatically configures the context for embedding mode using llama_set_embeddings LLama/LLamaEmbedder.cs79

LLamaReranker

Computes relevance scores between a query and multiple documents.

Sources: LLama/LLamaEmbedder.cs15-154 LLama/LLamaReranker.cs15-181


Model Quantization

LLamaQuantizer provides a static interface to the native llama_model_quantize API LLama/LLamaQuantizer.cs10-11


It supports a "relaxed" string-to-enum parsing for LLamaFtype, allowing partial matches like "Q5_K_M" LLama/LLamaQuantizer.cs126-151

Sources: LLama/LLamaQuantizer.cs10-153 LLama/Native/NativeApi.Quantize.cs10-15


Fundamental Types

TypeImplementationDescription
LLamaTokenstructInteger ID representing a piece of text LLama/Native/NativeApi.cs109
LLamaSeqIdstructIdentifier for a sequence within the KV cache LLama/Native/LLamaSeqId.cs10
LLamaBatchclassManaged structure for submitting multiple tokens for evaluation LLama/Native/LLamaNativeBatch.cs10

Data Flow: Text to Inference


Sources: LLama/LLamaContext.cs107-110 LLama/Native/SafeLLamaContextHandle.cs180-185 LLama/Native/LLamaSeqId.cs10


Native Handle Management

LLamaSharp uses a hierarchy of SafeHandle types to prevent memory leaks and use-after-free errors.

Sources: LLama/Native/SafeLlamaModelHandle.cs15-127 LLama/Native/SafeLLamaContextHandle.cs79-122