Last indexed: 18 May 2026 (ecd184)

DefaultSamplingPipeline

DefaultSamplingPipeline is the primary implementation of ISamplingPipeline that replicates llama.cpp's default sampling behavior with extensive configurability. It provides a comprehensive set of parameters for controlling token generation including temperature, top-k/p/min-p filtering, penalties, and grammar-constrained generation.

For greedy sampling (always selecting the most likely token), see GreedySamplingPipeline LLama/Sampling/GreedySamplingPipeline.cs8-9 For implementing custom sampling logic, see 4.3 Custom Samplers. For an overview of the sampling architecture, see 4.1 Sampling Pipeline Overview.

Sources: LLama/Sampling/DefaultSamplingPipeline.cs11-13

Pipeline Architecture

DefaultSamplingPipeline extends BaseSamplingPipeline LLama/Sampling/DefaultSamplingPipeline.cs11-12 and creates a chain of native llama.cpp samplers applied sequentially to transform logits into a selected token. The pipeline is constructed lazily on first use LLama/Sampling/BaseSamplingPipeline.cs37-39 and reused across sampling calls.

System Architecture to Code Entity Mapping

The following diagram associates natural language sampling concepts with their specific implementation entities in the LLamaSharp codebase.

Natural Language Concept	Code Entity	File Reference
Sampling Pipeline	`DefaultSamplingPipeline`	LLama/Sampling/DefaultSamplingPipeline.cs11
Sampler Chain	`SafeLLamaSamplerChainHandle`	LLama/Native/SafeLLamaSamplerHandle.cs13
Logit Representation	`LLamaTokenDataArray`	LLama/Native/LLamaTokenDataArray.cs12
Native Logit Buffer	`LLamaTokenDataArrayNative`	LLama/Native/LLamaTokenDataArray.cs136
Grammar Constraint	`Grammar`	LLama/Sampling/DefaultSamplingPipeline.cs105

Pipeline Flow Diagram

Sources: LLama/Sampling/DefaultSamplingPipeline.cs171-206 LLama/Sampling/BaseSamplingPipeline.cs10-72 LLama/Native/SafeLLamaSamplerHandle.cs13-112

Configuration Parameters

DefaultSamplingPipeline exposes properties that configure each stage of the sampling chain. All properties use init-only setters, requiring configuration at construction time.

Token Transformations

Parameter	Type	Default	Range	Description
`LogitBias`	`IReadOnlyDictionary<LLamaToken, float>`	Empty	Any float	Bias values added to specific token logits LLama/Sampling/DefaultSamplingPipeline.cs17
`RepeatPenalty`	`float`	1.0	>0	Repetition penalty from arxiv:1909.05858 LLama/Sampling/DefaultSamplingPipeline.cs22
`FrequencyPenalty`	`float`	0.0	-2.0 to 2.0	OpenAI-style penalty based on token frequency LLama/Sampling/DefaultSamplingPipeline.cs29-41
`PresencePenalty`	`float`	0.0	-2.0 to 2.0	OpenAI-style penalty based on token presence LLama/Sampling/DefaultSamplingPipeline.cs48-60
`PenaltyCount`	`int`	64	>0	Number of previous tokens considered for penalties LLama/Sampling/DefaultSamplingPipeline.cs65
`PenalizeNewline`	`bool`	false	-	Whether newline token is affected by penalties LLama/Sampling/DefaultSamplingPipeline.cs70
`PreventEOS`	`bool`	false	-	Suppress end-of-sequence token from being sampled LLama/Sampling/DefaultSamplingPipeline.cs75

Filtering and Selection

Parameter	Type	Default	Range	Description
`Temperature`	`float`	0.75	>0	Higher values increase randomness/creativity LLama/Sampling/DefaultSamplingPipeline.cs80
`TopK`	`int`	40	≥0	Keep only top K tokens by probability LLama/Sampling/DefaultSamplingPipeline.cs85
`TypicalP`	`float`	1.0	0-1	Locally typical sampling P threshold LLama/Sampling/DefaultSamplingPipeline.cs90
`TopP`	`float`	0.9	0-1	Nucleus sampling cumulative probability threshold LLama/Sampling/DefaultSamplingPipeline.cs95
`MinP`	`float`	0.1	0-1	Minimum probability threshold relative to max LLama/Sampling/DefaultSamplingPipeline.cs100
`MinKeep`	`int`	1	≥1	Minimum tokens preserved by filtering samplers LLama/Sampling/DefaultSamplingPipeline.cs110
`Seed`	`uint`	Random	Any	Random seed for distribution sampler LLama/Sampling/DefaultSamplingPipeline.cs115

Sources: LLama/Sampling/DefaultSamplingPipeline.cs16-115

Sampler Chain Construction

The CreateChain method builds the native sampler chain by adding samplers in a specific order that matches llama.cpp defaults.

The order is critical: transformations (bias, penalties) apply first, then filters reduce the candidate set (TopK, Typical, TopP, MinP), temperature adjusts probabilities, and finally the distribution sampler selects from the resulting distribution.

Sources: LLama/Sampling/DefaultSamplingPipeline.cs171-206 LLama/Native/SafeLLamaSamplerHandle.cs153-243

Grammar Optimization

When Grammar is configured, DefaultSamplingPipeline maintains a separate grammar-only sampler chain _grammarChain LLama/Sampling/DefaultSamplingPipeline.cs125 and applies it using one of three optimization strategies defined in GrammarOptimizationMode.

Optimization Strategy Mapping

Mode	Code Entity	Logic Description
None	`GrammarOptimizationMode.None`	Applies grammar to entire vocabulary first. Slowest. LLama/Sampling/DefaultSamplingPipeline.cs120
Basic	`GrammarOptimizationMode.Basic`	Tests if the single selected token from the main chain passes the grammar.
Extended	`GrammarOptimizationMode.Extended`	Tests selected token, then tests top K candidates if the first is rejected. Default. LLama/Sampling/DefaultSamplingPipeline.cs120

Grammar Execution Flow

Sources: LLama/Sampling/DefaultSamplingPipeline.cs209-296

Token Data Processing

DefaultSamplingPipeline works with LLamaTokenDataArray and LLamaTokenDataArrayNative structures to efficiently process logits.

The pipeline uses MemoryOwner<LLamaTokenData> and SpanOwner<LLamaLogitBias> for efficient buffer allocation and reuse, avoiding repeated heap allocations during sampling loops.

Sources: LLama/Native/LLamaTokenDataArray.cs12-136 LLama/Sampling/DefaultSamplingPipeline.cs177-218

State Management

DefaultSamplingPipeline maintains state across sampling calls through its base class and grammar chain.

Lifecycle Methods

Method	Purpose	Implementation
`Reset()`	Clears internal state (grammar parser state)	Calls `base.Reset()` and `_grammarChain?.Reset()` LLama/Sampling/DefaultSamplingPipeline.cs145-150
`Accept(token)`	Informs samplers a token was accepted	Calls `base.Accept(token)` and `_grammarChain?.Accept(token)` LLama/Sampling/DefaultSamplingPipeline.cs153-158
`Dispose()`	Releases native resources	Disposes chains via `base.Dispose()` and `_grammarChain` LLama/Sampling/DefaultSamplingPipeline.cs136-142

Sources: LLama/Sampling/DefaultSamplingPipeline.cs135-158 LLama/Sampling/BaseSamplingPipeline.cs26-71

Performance and Thread Safety

Memory Optimization

The pipeline significantly reduces GC pressure by using:

SpanOwner<LLamaLogitBias> for logit biases LLama/Sampling/DefaultSamplingPipeline.cs177
MemoryOwner<LLamaTokenData> for grammar candidate testing LLama/Sampling/DefaultSamplingPipeline.cs218
SpanOwner<float> for internal Softmax calculations LLama/Native/LLamaTokenDataArray.cs105

Thread Safety

DefaultSamplingPipeline instances are not thread-safe. Each instance maintains mutable state in _chain LLama/Sampling/BaseSamplingPipeline.cs10 and _grammarChain LLama/Sampling/DefaultSamplingPipeline.cs125 For concurrent sampling, separate pipeline instances should be used per conversation or execution thread.

The random seed generator uses a lock to ensure thread-safe seed initialization:

Sources: LLama/Sampling/DefaultSamplingPipeline.cs128-133

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/4.2-defaultsamplingpipeline