Last indexed: 18 May 2026 (ecd184)

Executor Overview

This page provides an overview of the executor abstraction layer in LLamaSharp, which defines different patterns for text generation inference. Executors provide high-level APIs that manage context lifecycle, state persistence, and token streaming while interfacing with the lower-level LLamaContext and LLamaWeights classes.

Purpose and Scope

The executor abstraction serves two primary purposes:

Decouple inference patterns from model management: Executors separate the logic of how tokens are generated (stateful chat, instruction-following, stateless queries) from the underlying model loading and context allocation.
Standardize streaming interfaces: All executors implement ILLamaExecutor.InferAsync(), providing a consistent async streaming API regardless of the underlying generation strategy LLama/Abstractions/ILLamaExecutor.cs10-41

This page focuses on the architecture and common infrastructure shared across all executor types.

Executor Hierarchy

Class Architecture Diagram

The following diagram bridges the functional roles to the specific C# entities.

Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaExecutorBase.cs20-126 LLama/LLamaInteractExecutor.cs23-52 LLama/LLamaInstructExecutor.cs22-61 LLama/LLamaStatelessExecutor.cs19-68

Core Interface: ILLamaExecutor

The ILLamaExecutor interface defines the contract all executors must implement LLama/Abstractions/ILLamaExecutor.cs10-41:

Member	Type	Purpose
`Context`	`LLamaContext`	The loaded context for inference operations LLama/Abstractions/ILLamaExecutor.cs15
`IsMultiModal`	`bool`	Indicates whether multimodal processing is enabled LLama/Abstractions/ILLamaExecutor.cs22
`ClipModel`	`MtmdWeights?`	Multimodal projection weights (MTMD) for vision/audio inputs LLama/Abstractions/ILLamaExecutor.cs26
`Embeds`	`List<SafeMtmdEmbed>`	Collection of processed media embeddings LLama/Abstractions/ILLamaExecutor.cs31
`InferAsync()`	`IAsyncEnumerable<string>`	Asynchronous streaming inference method LLama/Abstractions/ILLamaExecutor.cs40

The InferAsync() method provides the primary API surface, returning an async enumerable that yields decoded text chunks as they are generated.

Sources: LLama/Abstractions/ILLamaExecutor.cs10-41

Executor Architecture Patterns

Stateful vs Stateless Design

This diagram illustrates how data flows differently between persistent sessions and one-shot requests.

Sources: LLama/LLamaExecutorBase.cs103-126 LLama/LLamaStatelessExecutor.cs58-81

Key Architectural Differences

Aspect	Stateful (`StatefulExecutorBase`)	Stateless (`StatelessExecutor`)
Context Lifecycle	Single long-lived `LLamaContext` LLama/LLamaExecutorBase.cs107	Fresh `LLamaContext` per `InferAsync()` call LLama/LLamaStatelessExecutor.cs79
State Preservation	Maintains `_pastTokensCount`, `_embed_inps`, `_last_n_tokens` LLama/LLamaExecutorBase.cs29-61	No state between requests LLama/LLamaStatelessExecutor.cs16-17
KV Cache	Accumulates across turns to provide "memory"	Discarded after each request to save memory
Session Files	Supports `WithSessionFile()` for cache reuse LLama/LLamaExecutorBase.cs135	Not applicable
Constructor	Takes `LLamaContext` LLama/LLamaExecutorBase.cs103	Takes `LLamaWeights` and `IContextParams` LLama/LLamaStatelessExecutor.cs58

Sources: LLama/LLamaExecutorBase.cs20-135 LLama/LLamaStatelessExecutor.cs16-81

StatefulExecutorBase Infrastructure

Shared State Management

The abstract StatefulExecutorBase class maintains several critical state variables shared across InteractiveExecutor and InstructExecutor:

Sources: LLama/LLamaExecutorBase.cs29-61

Abstract Template Methods

StatefulExecutorBase defines abstract methods that subclasses must implement to customize behavior during the inference lifecycle (Template Method Pattern):

Method	Return Type	Purpose
`GetLoopCondition(args, token)`	`Task<bool>`	Determines whether to continue the generation loop LLama/LLamaInteractExecutor.cs121
`PreprocessInputs(text, args, token)`	`Task`	Tokenizes input and updates `_embed_inps` according to specific templates LLama/LLamaInteractExecutor.cs131

Sources: LLama/LLamaInteractExecutor.cs121-131 LLama/LLamaInstructExecutor.cs132-138

State Persistence

Serialization Architecture

The following diagram shows how runtime executor state is bridged to persistent storage.

Sources: LLama/LLamaInteractExecutor.cs55-114 LLama/LLamaInstructExecutor.cs64-129 LLama/LLamaExecutorBase.cs135-154

Session File Optimization

WithSessionFile Flow

When WithSessionFile(filename) is called, the executor attempts to load a saved llama.cpp session file using NativeApi.llama_state_load_file LLama/LLamaExecutorBase.cs135-147 It compares the tokens in the session file with the current prompt (_embed_inps) to determine _n_matching_session_tokens, allowing the executor to reuse prefix tokens without re-evaluation LLama/LLamaExecutorBase.cs161-170

Sources: LLama/LLamaExecutorBase.cs135-174

Multimodal Support

Both InteractiveExecutor and InstructExecutor support multimodal inputs when initialized with a MtmdWeights instance LLama/LLamaInteractExecutor.cs49 LLama/LLamaInstructExecutor.cs55

The IsMultiModal property returns true if ClipModel is not null LLama/LLamaExecutorBase.cs75-81 Multimodal inputs are processed via PreprocessMtmd, which handles image embeddings and special markers LLama/LLamaInteractExecutor.cs147

Sources: LLama/LLamaExecutorBase.cs75-84 LLama/LLamaInteractExecutor.cs49-147 LLama/LLamaInstructExecutor.cs55-61

Summary

The executor abstraction provides distinct inference patterns:

StatelessExecutor: Fresh context per request, suitable for one-time jobs and serverless-style deployments LLama/LLamaStatelessExecutor.cs16-19
InteractiveExecutor: Stateful chat mode where the model acts as an assistant, maintaining conversation history in the KV cache LLama/LLamaInteractExecutor.cs20-23
InstructExecutor: Stateful instruction-following mode using specific instruction/response prefixes LLama/LLamaInstructExecutor.cs20-22

All executors implement the ILLamaExecutor interface and provide asynchronous streaming via InferAsync() LLama/Abstractions/ILLamaExecutor.cs40

Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaInteractExecutor.cs20-23 LLama/LLamaInstructExecutor.cs20-22 LLama/LLamaStatelessExecutor.cs16-19

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/3.1-executor-overview