VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.3-sampling-api

⇱ Sampling API | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Sampling API

This page provides a comprehensive reference for the sampling API in LLamaSharp. The sampling API controls how tokens are selected from model output logits during text generation. It bridges the gap between raw model predictions and human-readable text by applying various filters, penalties, and mathematical transformations.

Core Interfaces and Base Classes

ISamplingPipeline Interface

ISamplingPipeline is the primary abstraction for token sampling. It defines the contract that all sampling implementations must fulfill to be used by executors.

Interface Definition:

LLama/Sampling/ISamplingPipeline.cs9-37

Methods:

MethodParametersReturnsDescription
SampleSafeLLamaContextHandle ctx, int indexLLamaTokenSample a single token from the context at the given index.
ApplySafeLLamaContextHandle ctx, LLamaTokenDataArray datavoidApply the sampling pipeline to a token data array to modify probabilities.
ResetNonevoidReset all internal state (e.g., grammar state, penalty history).
AcceptLLamaToken tokenvoidUpdate the pipeline with knowledge that a specific token was accepted.
DisposeNonevoidFree native resources held by the pipeline.

Sources: LLama/Sampling/ISamplingPipeline.cs9-37


BaseSamplingPipeline Abstract Class

BaseSamplingPipeline provides a base implementation of ISamplingPipeline that manages the lifecycle of a SafeLLamaSamplerChainHandle. It handles the lazy initialization and disposal of the native sampler chain.

Class Structure:

Title: Sampling Pipeline Hierarchy


Key Implementation Details:

Sources: LLama/Sampling/BaseSamplingPipeline.cs7-72


SafeLLamaSamplerChainHandle Class

SafeLLamaSamplerChainHandle is a SafeHandle wrapping a native llama_sampler (specifically a chain initialized via llama_sampler_chain_init).

Core Logic Flow:

Title: Native Sampler Interaction Flow


Core Methods:

Sources: LLama/Native/SafeLLamaSamplerHandle.cs13-197


Built-in Sampling Pipeline Implementations

DefaultSamplingPipeline

DefaultSamplingPipeline mimics the standard sampling logic found in llama.cpp's main example. It is highly configurable and supports complex features like grammar-constrained generation.

Configuration Properties:

PropertyDefaultSource
Temperature0.75fLLama/Sampling/DefaultSamplingPipeline.cs80
TopK40LLama/Sampling/DefaultSamplingPipeline.cs85
TopP0.9fLLama/Sampling/DefaultSamplingPipeline.cs95
MinP0.1fLLama/Sampling/DefaultSamplingPipeline.cs100
RepeatPenalty1.0fLLama/Sampling/DefaultSamplingPipeline.cs22
FrequencyPenalty0.0fLLama/Sampling/DefaultSamplingPipeline.cs29-41
PresencePenalty0.0fLLama/Sampling/DefaultSamplingPipeline.cs48-60
LogitBiasEmpty DictionaryLLama/Sampling/DefaultSamplingPipeline.cs18
PenaltyCount64LLama/Sampling/DefaultSamplingPipeline.cs65
PenalizeNewlinefalseLLama/Sampling/DefaultSamplingPipeline.cs70
PreventEOSfalseLLama/Sampling/DefaultSamplingPipeline.cs75
TypicalP1LLama/Sampling/DefaultSamplingPipeline.cs89
GrammarnullLLama/Sampling/DefaultSamplingPipeline.cs105
MinKeep1LLama/Sampling/DefaultSamplingPipeline.cs110
SeedRandomLLama/Sampling/DefaultSamplingPipeline.cs115
GrammarOptimizationExtendedLLama/Sampling/DefaultSamplingPipeline.cs120

Sampler Chain Construction Order: The pipeline adds samplers in a specific sequence to SafeLLamaSamplerChainHandle LLama/Sampling/DefaultSamplingPipeline.cs171-206:

  1. Logit Bias: AddLogitBias LLama/Sampling/DefaultSamplingPipeline.cs191
  2. Penalties: AddPenalties (Repeat, Frequency, Presence) LLama/Sampling/DefaultSamplingPipeline.cs195
  3. Filtering: AddTopK, AddTypical, AddTopP, AddMinP LLama/Sampling/DefaultSamplingPipeline.cs197-200
  4. Transformation: AddTemperature LLama/Sampling/DefaultSamplingPipeline.cs201
  5. Selection: AddDistributionSampler LLama/Sampling/DefaultSamplingPipeline.cs203

Grammar Optimization: The pipeline supports GrammarOptimizationMode LLama/Sampling/DefaultSamplingPipeline.cs120 When enabled, it attempts to sample a token using the fast base chain first. If the sampled token violates the grammar, it falls back to the slower but guaranteed grammar-constrained sampling LLama/Sampling/DefaultSamplingPipeline.cs209-296

Sources: LLama/Sampling/DefaultSamplingPipeline.cs11-318


GreedySamplingPipeline

A minimal pipeline that always selects the token with the highest logit.

Chain Construction:

  1. Optional Grammar Sampler LLama/Sampling/GreedySamplingPipeline.cs21-22
  2. AddGreedySampler LLama/Sampling/GreedySamplingPipeline.cs24

Sources: LLama/Sampling/GreedySamplingPipeline.cs1-28


MirostatSamplingPipeline

This pipeline implements the Mirostat 1.0 algorithm for controlling perplexity.

Configuration Properties:

PropertyDefaultSource
Tau5.0fLLama/Sampling/MirostatSamplingPipeline.cs16
Eta0.1fLLama/Sampling/MirostatSamplingPipeline.cs21
M100LLama/Sampling/MirostatSamplingPipeline.cs26
Temperature0.75fLLama/Sampling/MirostatSamplingPipeline.cs31
LogitBiasEmpty DictionaryLLama/Sampling/MirostatSamplingPipeline.cs36
RepeatPenalty1.0fLLama/Sampling/MirostatSamplingPipeline.cs39
FrequencyPenalty0.0fLLama/Sampling/MirostatSamplingPipeline.cs42-54
PresencePenalty0.0fLLama/Sampling/MirostatSamplingPipeline.cs57-69
PenaltyCount64LLama/Sampling/MirostatSamplingPipeline.cs72
PenalizeNewlinefalseLLama/Sampling/MirostatSamplingPipeline.cs75
PreventEOSfalseLLama/Sampling/MirostatSamplingPipeline.cs78
GrammarnullLLama/Sampling/MirostatSamplingPipeline.cs81
GrammarOptimizationExtendedLLama/Sampling/MirostatSamplingPipeline.cs86

Chain Construction:

  1. Logit Bias LLama/Sampling/MirostatSamplingPipeline.cs109-124
  2. Penalties LLama/Sampling/MirostatSamplingPipeline.cs126
  3. AddMirostat1Sampler LLama/Sampling/MirostatSamplingPipeline.cs128
  4. AddTemperature LLama/Sampling/MirostatSamplingPipeline.cs129

Sources: LLama/Sampling/MirostatSamplingPipeline.cs1-190


Mirostat2SamplingPipeline

This pipeline implements the Mirostat 2.0 algorithm for controlling perplexity.

Configuration Properties:

PropertyDefaultSource
Tau5.0fLLama/Sampling/Mirostat2SamplingPipeline.cs16
Eta0.1fLLama/Sampling/Mirostat2SamplingPipeline.cs21
LogitBiasEmpty DictionaryLLama/Sampling/Mirostat2SamplingPipeline.cs26
RepeatPenalty1.0fLLama/Sampling/Mirostat2SamplingPipeline.cs29
FrequencyPenalty0.0fLLama/Sampling/Mirostat2SamplingPipeline.cs32-44
PresencePenalty0.0fLLama/Sampling/Mirostat2SamplingPipeline.cs47-59
PenaltyCount64LLama/Sampling/Mirostat2SamplingPipeline.cs62
PenalizeNewlinefalseLLama/Sampling/Mirostat2SamplingPipeline.cs65
PreventEOSfalseLLama/Sampling/Mirostat2SamplingPipeline.cs68
GrammarnullLLama/Sampling/Mirostat2SamplingPipeline.cs71
GrammarOptimizationExtendedLLama/Sampling/Mirostat2SamplingPipeline.cs76

Chain Construction:

  1. Logit Bias LLama/Sampling/Mirostat2SamplingPipeline.cs99-114
  2. Penalties LLama/Sampling/Mirostat2SamplingPipeline.cs116
  3. AddMirostat2Sampler LLama/Sampling/Mirostat2SamplingPipeline.cs118

Sources: LLama/Sampling/Mirostat2SamplingPipeline.cs1-179


Token Data Structures

LLamaTokenDataArray

A managed structure containing an array of LLamaTokenData. It is used to pass candidates between managed code and native samplers.

Key Features:

Sources: LLama/Native/LLamaTokenDataArray.cs12-129


LLamaTokenDataArrayNative

The C# equivalent of the native llama_token_data_array struct. It is designed for zero-copy interop with the native library.

Memory Management:

Sources: LLama/Native/LLamaTokenDataArray.cs135-224


Sampler Implementation Reference

SafeLLamaSamplerChainHandle exposes numerous native samplers. Below is a categorized reference:

Penalty and Bias Samplers

Filtering Samplers

Advanced Samplers

Sources: LLama/Native/SafeLLamaSamplerHandle.cs218-647


Implementation Bridge: Native to Managed

The following diagram illustrates how native llama.cpp sampling structures are represented in LLamaSharp code.

Title: Logit Processing Entity Mapping


Sources: LLama/Native/LLamaTokenDataArray.cs135-155 LLama/Native/SafeLLamaSamplerHandle.cs13-15 LLama/Sampling/DefaultSamplingPipeline.cs11-12