VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/4.2-defaultsamplingpipeline

⇱ DefaultSamplingPipeline | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

DefaultSamplingPipeline

DefaultSamplingPipeline is the primary implementation of ISamplingPipeline that replicates llama.cpp's default sampling behavior with extensive configurability. It provides a comprehensive set of parameters for controlling token generation including temperature, top-k/p/min-p filtering, penalties, and grammar-constrained generation.

For greedy sampling (always selecting the most likely token), see GreedySamplingPipeline LLama/Sampling/GreedySamplingPipeline.cs8-9 For implementing custom sampling logic, see 4.3 Custom Samplers. For an overview of the sampling architecture, see 4.1 Sampling Pipeline Overview.

Sources: LLama/Sampling/DefaultSamplingPipeline.cs11-13


Pipeline Architecture

DefaultSamplingPipeline extends BaseSamplingPipeline LLama/Sampling/DefaultSamplingPipeline.cs11-12 and creates a chain of native llama.cpp samplers applied sequentially to transform logits into a selected token. The pipeline is constructed lazily on first use LLama/Sampling/BaseSamplingPipeline.cs37-39 and reused across sampling calls.

System Architecture to Code Entity Mapping

The following diagram associates natural language sampling concepts with their specific implementation entities in the LLamaSharp codebase.

Natural Language ConceptCode EntityFile Reference
Sampling PipelineDefaultSamplingPipelineLLama/Sampling/DefaultSamplingPipeline.cs11
Sampler ChainSafeLLamaSamplerChainHandleLLama/Native/SafeLLamaSamplerHandle.cs13
Logit RepresentationLLamaTokenDataArrayLLama/Native/LLamaTokenDataArray.cs12
Native Logit BufferLLamaTokenDataArrayNativeLLama/Native/LLamaTokenDataArray.cs136
Grammar ConstraintGrammarLLama/Sampling/DefaultSamplingPipeline.cs105

Pipeline Flow Diagram


Sources: LLama/Sampling/DefaultSamplingPipeline.cs171-206 LLama/Sampling/BaseSamplingPipeline.cs10-72 LLama/Native/SafeLLamaSamplerHandle.cs13-112


Configuration Parameters

DefaultSamplingPipeline exposes properties that configure each stage of the sampling chain. All properties use init-only setters, requiring configuration at construction time.

Token Transformations

ParameterTypeDefaultRangeDescription
LogitBiasIReadOnlyDictionary<LLamaToken, float>EmptyAny floatBias values added to specific token logits LLama/Sampling/DefaultSamplingPipeline.cs17
RepeatPenaltyfloat1.0>0Repetition penalty from arxiv:1909.05858 LLama/Sampling/DefaultSamplingPipeline.cs22
FrequencyPenaltyfloat0.0-2.0 to 2.0OpenAI-style penalty based on token frequency LLama/Sampling/DefaultSamplingPipeline.cs29-41
PresencePenaltyfloat0.0-2.0 to 2.0OpenAI-style penalty based on token presence LLama/Sampling/DefaultSamplingPipeline.cs48-60
PenaltyCountint64>0Number of previous tokens considered for penalties LLama/Sampling/DefaultSamplingPipeline.cs65
PenalizeNewlineboolfalse-Whether newline token is affected by penalties LLama/Sampling/DefaultSamplingPipeline.cs70
PreventEOSboolfalse-Suppress end-of-sequence token from being sampled LLama/Sampling/DefaultSamplingPipeline.cs75

Filtering and Selection

ParameterTypeDefaultRangeDescription
Temperaturefloat0.75>0Higher values increase randomness/creativity LLama/Sampling/DefaultSamplingPipeline.cs80
TopKint40≥0Keep only top K tokens by probability LLama/Sampling/DefaultSamplingPipeline.cs85
TypicalPfloat1.00-1Locally typical sampling P threshold LLama/Sampling/DefaultSamplingPipeline.cs90
TopPfloat0.90-1Nucleus sampling cumulative probability threshold LLama/Sampling/DefaultSamplingPipeline.cs95
MinPfloat0.10-1Minimum probability threshold relative to max LLama/Sampling/DefaultSamplingPipeline.cs100
MinKeepint1≥1Minimum tokens preserved by filtering samplers LLama/Sampling/DefaultSamplingPipeline.cs110
SeeduintRandomAnyRandom seed for distribution sampler LLama/Sampling/DefaultSamplingPipeline.cs115

Sources: LLama/Sampling/DefaultSamplingPipeline.cs16-115


Sampler Chain Construction

The CreateChain method builds the native sampler chain by adding samplers in a specific order that matches llama.cpp defaults.


The order is critical: transformations (bias, penalties) apply first, then filters reduce the candidate set (TopK, Typical, TopP, MinP), temperature adjusts probabilities, and finally the distribution sampler selects from the resulting distribution.

Sources: LLama/Sampling/DefaultSamplingPipeline.cs171-206 LLama/Native/SafeLLamaSamplerHandle.cs153-243


Grammar Optimization

When Grammar is configured, DefaultSamplingPipeline maintains a separate grammar-only sampler chain _grammarChain LLama/Sampling/DefaultSamplingPipeline.cs125 and applies it using one of three optimization strategies defined in GrammarOptimizationMode.

Optimization Strategy Mapping

ModeCode EntityLogic Description
NoneGrammarOptimizationMode.NoneApplies grammar to entire vocabulary first. Slowest. LLama/Sampling/DefaultSamplingPipeline.cs120
BasicGrammarOptimizationMode.BasicTests if the single selected token from the main chain passes the grammar.
ExtendedGrammarOptimizationMode.ExtendedTests selected token, then tests top K candidates if the first is rejected. Default. LLama/Sampling/DefaultSamplingPipeline.cs120

Grammar Execution Flow


Sources: LLama/Sampling/DefaultSamplingPipeline.cs209-296


Token Data Processing

DefaultSamplingPipeline works with LLamaTokenDataArray and LLamaTokenDataArrayNative structures to efficiently process logits.


The pipeline uses MemoryOwner<LLamaTokenData> and SpanOwner<LLamaLogitBias> for efficient buffer allocation and reuse, avoiding repeated heap allocations during sampling loops.

Sources: LLama/Native/LLamaTokenDataArray.cs12-136 LLama/Sampling/DefaultSamplingPipeline.cs177-218


State Management

DefaultSamplingPipeline maintains state across sampling calls through its base class and grammar chain.

Lifecycle Methods

MethodPurposeImplementation
Reset()Clears internal state (grammar parser state)Calls base.Reset() and _grammarChain?.Reset() LLama/Sampling/DefaultSamplingPipeline.cs145-150
Accept(token)Informs samplers a token was acceptedCalls base.Accept(token) and _grammarChain?.Accept(token) LLama/Sampling/DefaultSamplingPipeline.cs153-158
Dispose()Releases native resourcesDisposes chains via base.Dispose() and _grammarChain LLama/Sampling/DefaultSamplingPipeline.cs136-142

Sources: LLama/Sampling/DefaultSamplingPipeline.cs135-158 LLama/Sampling/BaseSamplingPipeline.cs26-71


Performance and Thread Safety

Memory Optimization

The pipeline significantly reduces GC pressure by using:

Thread Safety

DefaultSamplingPipeline instances are not thread-safe. Each instance maintains mutable state in _chain LLama/Sampling/BaseSamplingPipeline.cs10 and _grammarChain LLama/Sampling/DefaultSamplingPipeline.cs125 For concurrent sampling, separate pipeline instances should be used per conversation or execution thread.

The random seed generator uses a lock to ensure thread-safe seed initialization:


Sources: LLama/Sampling/DefaultSamplingPipeline.cs128-133