Last indexed: 18 May 2026 (ecd184)

Configuration Reference

This document provides a comprehensive reference for the configuration system in LLamaSharp. Configuration parameters control model loading, context initialization, and inference behavior. For detailed parameter listings, see child pages: Model Parameters (IModelParams), Context Parameters (IContextParams), and Inference Parameters.

Configuration Architecture

LLamaSharp's configuration system is organized into three distinct parameter categories, each controlling a different aspect of the inference pipeline:

Parameter Type	Interface	Primary Implementation	Controls
Model Parameters	`IModelParams`	`ModelParams`	Model loading, GPU offloading, memory management
Context Parameters	`IContextParams`	Inherited by `ILLamaParams`	Context initialization, batch sizes, RoPE configuration
Inference Parameters	`IInferenceParams`	`InferenceParams`	Token selection strategy, max tokens, antiprompts, overflow strategy

The configuration flow proceeds through distinct stages, bridging managed C# objects to native C++ structures:

Configuration Flow: Managed to Native

Sources: LLama/Common/ModelParams.cs13-158 LLama/Abstractions/IModelParams.cs15-91 LLama/Abstractions/IContextParams.cs9-146 LLama/Extensions/IContextParamsExtensions.cs21-71

Parameter Inheritance and Usage

The ModelParams class implements ILLamaParams, which acts as a unified container for both model-loading and context-initialization settings LLama/Common/ModelParams.cs13-15 This interface is frequently used by high-level APIs, such as LLama.Web, to ensure consistent configuration across the lifecycle of a model.

Code Entity Relationship Diagram

Sources: LLama/Common/ModelParams.cs13-14 LLama.Web/Common/ModelOptions.cs7-9 LLama/Abstractions/IModelParams.cs15-16 LLama/Abstractions/IContextParams.cs9-10 LLama/Common/InferenceParams.cs12-13 LLama/Abstractions/IInferenceParams.cs10-11

Model Parameters Overview

Model parameters control how GGUF model files are loaded into memory and distributed across hardware.

GPU Offloading: GpuLayerCount determines how many layers are offloaded to VRAM LLama/Common/ModelParams.cs29 MainGpu and SplitMode control multi-GPU distribution LLama/Common/ModelParams.cs20-23
Memory Strategy: UseMemorymap enables mmap for faster loads LLama/Common/ModelParams.cs35 while UseMemoryLock (mlock) prevents the model from being swapped to disk LLama/Common/ModelParams.cs41
Specialized Loading: VocabOnly allows loading only the vocabulary without weights LLama/Common/ModelParams.cs117 MetadataOverrides allow runtime modification of specific GGUF metadata keys LLama/Common/ModelParams.cs68
Advanced Tensors: TensorSplits allows fine-grained control over work distribution across multiple GPUs LLama/Abstractions/IModelParams.cs75-117 TensorBufferOverrides allows specifying hardware devices for individual tensors LLama/Abstractions/IModelParams.cs45

For details, see Model Parameters (IModelParams).

Sources: LLama/Common/ModelParams.cs1-158 LLama/Abstractions/IModelParams.cs15-91

Context Parameters Overview

Context parameters configure the runtime environment for a loaded model, specifically the KV cache and processing limits.

Sizing: ContextSize (n_ctx) sets the total token capacity for the context LLama/Abstractions/IContextParams.cs14
Batching: BatchSize (logical batch) and UBatchSize (physical batch) control throughput and memory consumption during evaluation LLama/Abstractions/IContextParams.cs19-24
Performance: FlashAttention enables optimized attention kernels where supported LLama/Abstractions/IContextParams.cs109 Threads and BatchThreads control CPU parallelism LLama/Abstractions/IContextParams.cs54-59
KV Cache: TypeK and TypeV allow overriding the precision (e.g., F16, Q4_0) of the Key/Value cache LLama/Abstractions/IContextParams.cs94-99 KVUnified and SwaFull provide control over attention buffer management LLama/Abstractions/IContextParams.cs137-145
Scaling: Support for RoPE and YaRN scaling parameters (e.g., RopeFrequencyBase, YarnExtrapolationFactor) allows for extending context lengths beyond model defaults LLama/Abstractions/IContextParams.cs39-89

For details, see Context Parameters (IContextParams).

Sources: LLama/Abstractions/IContextParams.cs1-146 LLama/Extensions/IContextParamsExtensions.cs21-71

Inference Parameters Overview

Inference parameters are passed during generation to control token selection, stopping criteria, and context window management.

Limits: MaxTokens (n_predict) sets the maximum number of tokens to generate LLama/Common/InferenceParams.cs24
Stopping: AntiPrompts define sequences where the model will stop generating further tokens LLama/Common/InferenceParams.cs29
Sampling: SamplingPipeline manages the chain of samplers LLama/Common/InferenceParams.cs32
Context Management:
- TokensKeep specifies how many tokens to preserve from the initial prompt when context shifting occurs LLama/Common/InferenceParams.cs18
- OverflowStrategy defines behavior (e.g., ThrowException or TruncateAndReprefill) when the context window is full LLama/Common/InferenceParams.cs43
Special Tokens: DecodeSpecialTokens determines if special characters (like BOS/EOS) are visible in the decoded output LLama/Common/InferenceParams.cs35

For details, see Inference Parameters.

Sources: LLama/Common/InferenceParams.cs12-51 LLama/Abstractions/IInferenceParams.cs10-54

Configuration Serialization

LLamaSharp supports JSON serialization for ModelParams to allow persistence of configuration states.

Encoding: Since System.Text.Encoding cannot be directly serialized, LLamaSharp stores the EncodingName string and uses it to restore the Encoding object LLama/Common/ModelParams.cs128-141
Complex Types: TensorSplitsCollection and MetadataOverride use custom JSON converters to handle data structures that map to native llama.cpp requirements LLama/Abstractions/IModelParams.cs96-192
Round-tripping: Unit tests verify that parameters like BatchSize, ContextSize, and MetadataOverrides are correctly preserved through serialization cycles using System.Text.Json LLama.Unittest/ModelsParamsTests.cs10-55

Sources: LLama/Common/ModelParams.cs128-141 LLama/Abstractions/IModelParams.cs170-185 LLama.Unittest/ModelsParamsTests.cs10-55

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/6-configuration-reference