Last indexed: 18 May 2026 (ecd184)

Sampling API

This page provides a comprehensive reference for the sampling API in LLamaSharp. The sampling API controls how tokens are selected from model output logits during text generation. It bridges the gap between raw model predictions and human-readable text by applying various filters, penalties, and mathematical transformations.

Core Interfaces and Base Classes

ISamplingPipeline Interface

ISamplingPipeline is the primary abstraction for token sampling. It defines the contract that all sampling implementations must fulfill to be used by executors.

Interface Definition:

LLama/Sampling/ISamplingPipeline.cs9-37

Methods:

Method	Parameters	Returns	Description
`Sample`	`SafeLLamaContextHandle ctx, int index`	`LLamaToken`	Sample a single token from the context at the given index.
`Apply`	`SafeLLamaContextHandle ctx, LLamaTokenDataArray data`	`void`	Apply the sampling pipeline to a token data array to modify probabilities.
`Reset`	None	`void`	Reset all internal state (e.g., grammar state, penalty history).
`Accept`	`LLamaToken token`	`void`	Update the pipeline with knowledge that a specific token was accepted.
`Dispose`	None	`void`	Free native resources held by the pipeline.

Sources: LLama/Sampling/ISamplingPipeline.cs9-37

BaseSamplingPipeline Abstract Class

BaseSamplingPipeline provides a base implementation of ISamplingPipeline that manages the lifecycle of a SafeLLamaSamplerChainHandle. It handles the lazy initialization and disposal of the native sampler chain.

Class Structure:

Title: Sampling Pipeline Hierarchy

Key Implementation Details:

Lazy Creation: The native chain is created only when first needed via CreateChain(ctx) LLama/Sampling/BaseSamplingPipeline.cs37
State Management: Accept LLama/Sampling/BaseSamplingPipeline.cs68-71 and Reset LLama/Sampling/BaseSamplingPipeline.cs62-65 calls are forwarded directly to the native handle.
Memory Safety: Implements Dispose to ensure the native llama_sampler resources are freed LLama/Sampling/BaseSamplingPipeline.cs26-32

Sources: LLama/Sampling/BaseSamplingPipeline.cs7-72

SafeLLamaSamplerChainHandle Class

SafeLLamaSamplerChainHandle is a SafeHandle wrapping a native llama_sampler (specifically a chain initialized via llama_sampler_chain_init).

Core Logic Flow:

Title: Native Sampler Interaction Flow

Core Methods:

Sample: A shorthand that retrieves logits from the context, applies the chain, and accepts the resulting token in one call LLama/Native/SafeLLamaSamplerHandle.cs66-89
Apply: Applies the sampler logic to a LLamaTokenDataArrayNative without automatically picking or accepting a token LLama/Native/SafeLLamaSamplerHandle.cs42-49
AddClone: Allows copying a stage from one sampler chain into another LLama/Native/SafeLLamaSamplerHandle.cs167-181

Sources: LLama/Native/SafeLLamaSamplerHandle.cs13-197

Built-in Sampling Pipeline Implementations

DefaultSamplingPipeline

DefaultSamplingPipeline mimics the standard sampling logic found in llama.cpp's main example. It is highly configurable and supports complex features like grammar-constrained generation.

Configuration Properties:

Property	Default	Source
`Temperature`	0.75f	LLama/Sampling/DefaultSamplingPipeline.cs80
`TopK`	40	LLama/Sampling/DefaultSamplingPipeline.cs85
`TopP`	0.9f	LLama/Sampling/DefaultSamplingPipeline.cs95
`MinP`	0.1f	LLama/Sampling/DefaultSamplingPipeline.cs100
`RepeatPenalty`	1.0f	LLama/Sampling/DefaultSamplingPipeline.cs22
`FrequencyPenalty`	0.0f	LLama/Sampling/DefaultSamplingPipeline.cs29-41
`PresencePenalty`	0.0f	LLama/Sampling/DefaultSamplingPipeline.cs48-60
`LogitBias`	Empty Dictionary	LLama/Sampling/DefaultSamplingPipeline.cs18
`PenaltyCount`	64	LLama/Sampling/DefaultSamplingPipeline.cs65
`PenalizeNewline`	false	LLama/Sampling/DefaultSamplingPipeline.cs70
`PreventEOS`	false	LLama/Sampling/DefaultSamplingPipeline.cs75
`TypicalP`	1	LLama/Sampling/DefaultSamplingPipeline.cs89
`Grammar`	null	LLama/Sampling/DefaultSamplingPipeline.cs105
`MinKeep`	1	LLama/Sampling/DefaultSamplingPipeline.cs110
`Seed`	Random	LLama/Sampling/DefaultSamplingPipeline.cs115
`GrammarOptimization`	`Extended`	LLama/Sampling/DefaultSamplingPipeline.cs120

Sampler Chain Construction Order: The pipeline adds samplers in a specific sequence to SafeLLamaSamplerChainHandle LLama/Sampling/DefaultSamplingPipeline.cs171-206:

Logit Bias: AddLogitBias LLama/Sampling/DefaultSamplingPipeline.cs191
Penalties: AddPenalties (Repeat, Frequency, Presence) LLama/Sampling/DefaultSamplingPipeline.cs195
Filtering: AddTopK, AddTypical, AddTopP, AddMinP LLama/Sampling/DefaultSamplingPipeline.cs197-200
Transformation: AddTemperature LLama/Sampling/DefaultSamplingPipeline.cs201
Selection: AddDistributionSampler LLama/Sampling/DefaultSamplingPipeline.cs203

Grammar Optimization: The pipeline supports GrammarOptimizationMode LLama/Sampling/DefaultSamplingPipeline.cs120 When enabled, it attempts to sample a token using the fast base chain first. If the sampled token violates the grammar, it falls back to the slower but guaranteed grammar-constrained sampling LLama/Sampling/DefaultSamplingPipeline.cs209-296

Sources: LLama/Sampling/DefaultSamplingPipeline.cs11-318

GreedySamplingPipeline

A minimal pipeline that always selects the token with the highest logit.

Chain Construction:

Optional Grammar Sampler LLama/Sampling/GreedySamplingPipeline.cs21-22
AddGreedySampler LLama/Sampling/GreedySamplingPipeline.cs24

Sources: LLama/Sampling/GreedySamplingPipeline.cs1-28

MirostatSamplingPipeline

This pipeline implements the Mirostat 1.0 algorithm for controlling perplexity.

Configuration Properties:

Property	Default	Source
`Tau`	5.0f	LLama/Sampling/MirostatSamplingPipeline.cs16
`Eta`	0.1f	LLama/Sampling/MirostatSamplingPipeline.cs21
`M`	100	LLama/Sampling/MirostatSamplingPipeline.cs26
`Temperature`	0.75f	LLama/Sampling/MirostatSamplingPipeline.cs31
`LogitBias`	Empty Dictionary	LLama/Sampling/MirostatSamplingPipeline.cs36
`RepeatPenalty`	1.0f	LLama/Sampling/MirostatSamplingPipeline.cs39
`FrequencyPenalty`	0.0f	LLama/Sampling/MirostatSamplingPipeline.cs42-54
`PresencePenalty`	0.0f	LLama/Sampling/MirostatSamplingPipeline.cs57-69
`PenaltyCount`	64	LLama/Sampling/MirostatSamplingPipeline.cs72
`PenalizeNewline`	false	LLama/Sampling/MirostatSamplingPipeline.cs75
`PreventEOS`	false	LLama/Sampling/MirostatSamplingPipeline.cs78
`Grammar`	null	LLama/Sampling/MirostatSamplingPipeline.cs81
`GrammarOptimization`	`Extended`	LLama/Sampling/MirostatSamplingPipeline.cs86

Chain Construction:

Logit Bias LLama/Sampling/MirostatSamplingPipeline.cs109-124
Penalties LLama/Sampling/MirostatSamplingPipeline.cs126
AddMirostat1Sampler LLama/Sampling/MirostatSamplingPipeline.cs128
AddTemperature LLama/Sampling/MirostatSamplingPipeline.cs129

Sources: LLama/Sampling/MirostatSamplingPipeline.cs1-190

Mirostat2SamplingPipeline

This pipeline implements the Mirostat 2.0 algorithm for controlling perplexity.

Configuration Properties:

Property	Default	Source
`Tau`	5.0f	LLama/Sampling/Mirostat2SamplingPipeline.cs16
`Eta`	0.1f	LLama/Sampling/Mirostat2SamplingPipeline.cs21
`LogitBias`	Empty Dictionary	LLama/Sampling/Mirostat2SamplingPipeline.cs26
`RepeatPenalty`	1.0f	LLama/Sampling/Mirostat2SamplingPipeline.cs29
`FrequencyPenalty`	0.0f	LLama/Sampling/Mirostat2SamplingPipeline.cs32-44
`PresencePenalty`	0.0f	LLama/Sampling/Mirostat2SamplingPipeline.cs47-59
`PenaltyCount`	64	LLama/Sampling/Mirostat2SamplingPipeline.cs62
`PenalizeNewline`	false	LLama/Sampling/Mirostat2SamplingPipeline.cs65
`PreventEOS`	false	LLama/Sampling/Mirostat2SamplingPipeline.cs68
`Grammar`	null	LLama/Sampling/Mirostat2SamplingPipeline.cs71
`GrammarOptimization`	`Extended`	LLama/Sampling/Mirostat2SamplingPipeline.cs76

Chain Construction:

Logit Bias LLama/Sampling/Mirostat2SamplingPipeline.cs99-114
Penalties LLama/Sampling/Mirostat2SamplingPipeline.cs116
AddMirostat2Sampler LLama/Sampling/Mirostat2SamplingPipeline.cs118

Sources: LLama/Sampling/Mirostat2SamplingPipeline.cs1-179

Token Data Structures

LLamaTokenDataArray

A managed structure containing an array of LLamaTokenData. It is used to pass candidates between managed code and native samplers.

Key Features:

Softmax(): Sorts candidates by logits in descending order and calculates probabilities using TensorPrimitives.SoftMax LLama/Native/LLamaTokenDataArray.cs95-118
OverwriteLogits(): Allows manual modification of specific token scores before sampling LLama/Native/LLamaTokenDataArray.cs71-90

Sources: LLama/Native/LLamaTokenDataArray.cs12-129

LLamaTokenDataArrayNative

The C# equivalent of the native llama_token_data_array struct. It is designed for zero-copy interop with the native library.

Memory Management:

Must be pinned in memory during use LLama/Native/LLamaTokenDataArray.cs141
The Create static method provides a MemoryHandle for pinning LLama/Native/LLamaTokenDataArray.cs208-223

Sources: LLama/Native/LLamaTokenDataArray.cs135-224

Sampler Implementation Reference

SafeLLamaSamplerChainHandle exposes numerous native samplers. Below is a categorized reference:

Penalty and Bias Samplers

AddPenalties: Combines repetition, frequency, and presence penalties LLama/Native/SafeLLamaSamplerHandle.cs499-520
AddLogitBias: Adds fixed values to specific tokens LLama/Native/SafeLLamaSamplerHandle.cs629-647
AddDry: "Don't Repeat Yourself" sampler LLama/Native/SafeLLamaSamplerHandle.cs532-578

Filtering Samplers

AddTopK: Keeps only the top $K$ tokens LLama/Native/SafeLLamaSamplerHandle.cs277-283
AddTopP: Nucleus sampling (cumulative probability) LLama/Native/SafeLLamaSamplerHandle.cs301-309
AddMinP: Filters tokens based on a percentage of the maximum probability LLama/Native/SafeLLamaSamplerHandle.cs314-322
AddTypical: Typical sampling based on information theory LLama/Native/SafeLLamaSamplerHandle.cs327-335

Advanced Samplers

AddMirostat1Sampler / AddMirostat2Sampler: Active control of perplexity LLama/Native/SafeLLamaSamplerHandle.cs249-270
AddXTC: "Exclude Top Choices" sampler LLama/Native/SafeLLamaSamplerHandle.cs372-378
AddGrammar: Constrains output to a GBNF grammar LLama/Native/SafeLLamaSamplerHandle.cs420-447

Sources: LLama/Native/SafeLLamaSamplerHandle.cs218-647

Implementation Bridge: Native to Managed

The following diagram illustrates how native llama.cpp sampling structures are represented in LLamaSharp code.

Title: Logit Processing Entity Mapping

Sources: LLama/Native/LLamaTokenDataArray.cs135-155 LLama/Native/SafeLLamaSamplerHandle.cs13-15 LLama/Sampling/DefaultSamplingPipeline.cs11-12

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.3-sampling-api

⇱ Sampling API | SciSharp/LLamaSharp | DeepWiki

Sampling API

Core Interfaces and Base Classes

ISamplingPipeline Interface

BaseSamplingPipeline Abstract Class

SafeLLamaSamplerChainHandle Class

Built-in Sampling Pipeline Implementations

DefaultSamplingPipeline

GreedySamplingPipeline

MirostatSamplingPipeline

Mirostat2SamplingPipeline

Token Data Structures

LLamaTokenDataArray

LLamaTokenDataArrayNative

Sampler Implementation Reference

Penalty and Bias Samplers

Filtering Samplers

Advanced Samplers

Implementation Bridge: Native to Managed

On this page