VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/9.2-executor-api

⇱ Executor API | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Executor API

This page provides detailed API reference documentation for the executor classes and interfaces in LLamaSharp. Executors are the primary abstraction for text generation, managing the interaction between user prompts, model inference, and token generation.

For high-level usage patterns and examples, see Chat Sessions For configuration of sampling behavior, see Sampling API (9.3). For the underlying context management, see Core API Classes (9.1).

Overview

The executor API is built around the ILLamaExecutor interface, which defines the contract for all text generation operations. LLamaSharp provides several primary executor implementations:

Sources: LLama/LLamaInteractExecutor.cs23-24 LLama/LLamaInstructExecutor.cs22-23 LLama/LLamaStatelessExecutor.cs19-21 LLama/Batched/BatchedExecutor.cs14-16

ILLamaExecutor Interface

The ILLamaExecutor interface defines the core contract for all executor implementations.

Interface Members

MemberTypeDescription
ContextLLamaContextThe loaded context for inference operations. LLama/LLamaExecutorBase.cs65
IsMultiModalboolIndicates whether the executor supports multimodal inputs. LLama/LLamaExecutorBase.cs75-81
ClipModelMtmdWeights?Multimodal projection weights (MTMD) for image/audio processing. LLama/LLamaExecutor.cs84
EmbedsList<SafeMtmdEmbed>Collection of media embeddings for multimodal models. LLama/LLamaExecutorBase.cs87
InferAsyncMethodAsynchronously generates text from a prompt. LLama/LLamaStatelessExecutor.cs73

Sources: LLama/LLamaExecutorBase.cs65-87 LLama/LLamaStatelessExecutor.cs73

Executor Architecture

The following diagram maps the relationship between the high-level executor abstractions and the concrete implementations used for inference.

Natural Language Space to Code Entity Space


Sources: LLama/LLamaExecutorBase.cs20-21 LLama/LLamaStatelessExecutor.cs19-21 LLama/LLamaInteractExecutor.cs23-24 LLama/LLamaInstructExecutor.cs22-23 LLama/Batched/BatchedExecutor.cs14-16

StatefulExecutorBase

Abstract base class providing common functionality for stateful executors that maintain conversation context across multiple turns. LLama/LLamaExecutorBase.cs20-21

Constructor


Sources: LLama/LLamaExecutorBase.cs103-126

Protected State Fields

FieldTypePurpose
_pastTokensCountintNumber of tokens already processed (n_past). LLama/LLamaExecutorBase.cs29
_consumedTokensCountintTokens consumed during current inference (n_consume). LLama/LLamaExecutorBase.cs33
_embed_inpsList<LLamaToken>Input tokens to be processed. LLama/LLamaExecutorBase.cs53
_embedsList<LLamaToken>Tokens pending evaluation. LLama/LLamaExecutorBase.cs49
_last_n_tokensFixedSizeQueue<LLamaToken>Recent token history for sampling. LLama/LLamaExecutorBase.cs61
_session_tokensList<LLamaToken>Tokens from loaded session file. LLama/LLamaExecutorBase.cs57
_pathSessionstring?Path to session cache file. LLama/LLamaExecutorBase.cs45

Sources: LLama/LLamaExecutorBase.cs25-61

Session File Management

The WithSessionFile method attaches a session cache file to the executor. If the file exists, it loads the KV cache state and attempts to reuse matching token prefixes to accelerate subsequent inference using NativeApi.llama_state_load_file. LLama/LLamaExecutorBase.cs147-152

Sources: LLama/LLamaExecutorBase.cs135-170

InteractiveExecutor

Stateful executor designed for conversational interactions. Maintains context across multiple turns and supports antiprompts for turn-taking. LLama/LLamaInteractExecutor.cs23-24

Execution Flow


Sources: LLama/LLamaInteractExecutor.cs131-174 LLama/LLamaExecutorBase.cs508-556

Multimodal Support (MTMD)

The InteractiveExecutor supports multimodal inputs via MtmdWeights. Media inputs (images/audio) are identified in text prompts and processed into SafeMtmdInputChunks before being evaluated by the model. LLama/LLamaInteractExecutor.cs49-52 LLama/LLamaInteractExecutor.cs28-30

Sources: LLama/LLamaInteractExecutor.cs49-52 LLama/LLamaInteractExecutor.cs28-30

InstructExecutor

Stateful executor for instruction-following patterns. Automatically wraps user input with configurable prefix and suffix templates. LLama/LLamaInstructExecutor.cs22-23

Template Wrapping

The executor pre-tokenizes the instruction prefix and suffix during construction:


During preprocessing, user input is wrapped: [_inp_pfx] + [user_text_tokens] + [_inp_sfx]. LLama/LLamaInstructExecutor.cs170-176

Sources: LLama/LLamaInstructExecutor.cs44-45 LLama/LLamaInstructExecutor.cs170-176

StatelessExecutor

Independent inference executor that creates a fresh LLamaContext for each InferAsync call. No state is preserved between calls. LLama/LLamaStatelessExecutor.cs19-21

It supports the ApplyTemplate property, which uses LLamaTemplate to apply model-specific chat templates (like ChatML or Llama-3) to the prompt before inference. LLama/LLamaStatelessExecutor.cs44-49 LLama/LLamaStatelessExecutor.cs97-104

Sources: LLama/LLamaStatelessExecutor.cs72-81 LLama/LLamaStatelessExecutor.cs97-104

BatchedExecutor

The BatchedExecutor manages multiple concurrent Conversation threads within a single context. It uses an Epoch system to synchronize inference across multiple sequences. LLama/Batched/BatchedExecutor.cs14-16

Key Components

Batched Inference Lifecycle


Sources: LLama/Batched/BatchedExecutor.cs147-153 LLama/Batched/BatchedExecutor.cs194-210 LLama/Batched/Conversation.cs160-187

Chat Sessions

The ChatSession class provides a high-level API for managing conversations, including history management and text transformations. LLama/ChatSession.cs21-22

ChatSession Components

ComponentInterfaceRole
ExecutorILLamaExecutorHandles the underlying model inference. LLama/ChatSession.cs51
HistoryChatHistoryStores the list of messages in the conversation. LLama/ChatSession.cs56
HistoryTransformIHistoryTransformConverts ChatHistory to/from prompt strings. LLama/ChatSession.cs61
InputTransformList<ITextTransform>Processes user text before it is sent to the executor. LLama/ChatSession.cs66
OutputTransformITextStreamTransformProcesses generated token streams. LLama/ChatSession.cs71

Sources: LLama/ChatSession.cs49-71

Session Persistence

ChatSession can serialize its entire state, including the KV cache and history, into a directory. LLama/ChatSession.cs163-166

Sources: LLama/ChatSession.cs23-46 LLama/ChatSession.cs163-183


Page Sources: