VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/3.1-executor-overview

⇱ Executor Overview | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Executor Overview

This page provides an overview of the executor abstraction layer in LLamaSharp, which defines different patterns for text generation inference. Executors provide high-level APIs that manage context lifecycle, state persistence, and token streaming while interfacing with the lower-level LLamaContext and LLamaWeights classes.


Purpose and Scope

The executor abstraction serves two primary purposes:

  1. Decouple inference patterns from model management: Executors separate the logic of how tokens are generated (stateful chat, instruction-following, stateless queries) from the underlying model loading and context allocation.
  2. Standardize streaming interfaces: All executors implement ILLamaExecutor.InferAsync(), providing a consistent async streaming API regardless of the underlying generation strategy LLama/Abstractions/ILLamaExecutor.cs10-41

This page focuses on the architecture and common infrastructure shared across all executor types.


Executor Hierarchy

Class Architecture Diagram

The following diagram bridges the functional roles to the specific C# entities.


Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaExecutorBase.cs20-126 LLama/LLamaInteractExecutor.cs23-52 LLama/LLamaInstructExecutor.cs22-61 LLama/LLamaStatelessExecutor.cs19-68


Core Interface: ILLamaExecutor

The ILLamaExecutor interface defines the contract all executors must implement LLama/Abstractions/ILLamaExecutor.cs10-41:

MemberTypePurpose
ContextLLamaContextThe loaded context for inference operations LLama/Abstractions/ILLamaExecutor.cs15
IsMultiModalboolIndicates whether multimodal processing is enabled LLama/Abstractions/ILLamaExecutor.cs22
ClipModelMtmdWeights?Multimodal projection weights (MTMD) for vision/audio inputs LLama/Abstractions/ILLamaExecutor.cs26
EmbedsList<SafeMtmdEmbed>Collection of processed media embeddings LLama/Abstractions/ILLamaExecutor.cs31
InferAsync()IAsyncEnumerable<string>Asynchronous streaming inference method LLama/Abstractions/ILLamaExecutor.cs40

The InferAsync() method provides the primary API surface, returning an async enumerable that yields decoded text chunks as they are generated.

Sources: LLama/Abstractions/ILLamaExecutor.cs10-41


Executor Architecture Patterns

Stateful vs Stateless Design

This diagram illustrates how data flows differently between persistent sessions and one-shot requests.


Sources: LLama/LLamaExecutorBase.cs103-126 LLama/LLamaStatelessExecutor.cs58-81

Key Architectural Differences

AspectStateful (StatefulExecutorBase)Stateless (StatelessExecutor)
Context LifecycleSingle long-lived LLamaContext LLama/LLamaExecutorBase.cs107Fresh LLamaContext per InferAsync() call LLama/LLamaStatelessExecutor.cs79
State PreservationMaintains _pastTokensCount, _embed_inps, _last_n_tokens LLama/LLamaExecutorBase.cs29-61No state between requests LLama/LLamaStatelessExecutor.cs16-17
KV CacheAccumulates across turns to provide "memory"Discarded after each request to save memory
Session FilesSupports WithSessionFile() for cache reuse LLama/LLamaExecutorBase.cs135Not applicable
ConstructorTakes LLamaContext LLama/LLamaExecutorBase.cs103Takes LLamaWeights and IContextParams LLama/LLamaStatelessExecutor.cs58

Sources: LLama/LLamaExecutorBase.cs20-135 LLama/LLamaStatelessExecutor.cs16-81


StatefulExecutorBase Infrastructure

Shared State Management

The abstract StatefulExecutorBase class maintains several critical state variables shared across InteractiveExecutor and InstructExecutor:


Sources: LLama/LLamaExecutorBase.cs29-61

Abstract Template Methods

StatefulExecutorBase defines abstract methods that subclasses must implement to customize behavior during the inference lifecycle (Template Method Pattern):

MethodReturn TypePurpose
GetLoopCondition(args, token)Task<bool>Determines whether to continue the generation loop LLama/LLamaInteractExecutor.cs121
PreprocessInputs(text, args, token)TaskTokenizes input and updates _embed_inps according to specific templates LLama/LLamaInteractExecutor.cs131

Sources: LLama/LLamaInteractExecutor.cs121-131 LLama/LLamaInstructExecutor.cs132-138


State Persistence

Serialization Architecture

The following diagram shows how runtime executor state is bridged to persistent storage.


Sources: LLama/LLamaInteractExecutor.cs55-114 LLama/LLamaInstructExecutor.cs64-129 LLama/LLamaExecutorBase.cs135-154


Session File Optimization

WithSessionFile Flow

When WithSessionFile(filename) is called, the executor attempts to load a saved llama.cpp session file using NativeApi.llama_state_load_file LLama/LLamaExecutorBase.cs135-147 It compares the tokens in the session file with the current prompt (_embed_inps) to determine _n_matching_session_tokens, allowing the executor to reuse prefix tokens without re-evaluation LLama/LLamaExecutorBase.cs161-170

Sources: LLama/LLamaExecutorBase.cs135-174


Multimodal Support

Both InteractiveExecutor and InstructExecutor support multimodal inputs when initialized with a MtmdWeights instance LLama/LLamaInteractExecutor.cs49 LLama/LLamaInstructExecutor.cs55

The IsMultiModal property returns true if ClipModel is not null LLama/LLamaExecutorBase.cs75-81 Multimodal inputs are processed via PreprocessMtmd, which handles image embeddings and special markers LLama/LLamaInteractExecutor.cs147

Sources: LLama/LLamaExecutorBase.cs75-84 LLama/LLamaInteractExecutor.cs49-147 LLama/LLamaInstructExecutor.cs55-61


Summary

The executor abstraction provides distinct inference patterns:

  1. StatelessExecutor: Fresh context per request, suitable for one-time jobs and serverless-style deployments LLama/LLamaStatelessExecutor.cs16-19
  2. InteractiveExecutor: Stateful chat mode where the model acts as an assistant, maintaining conversation history in the KV cache LLama/LLamaInteractExecutor.cs20-23
  3. InstructExecutor: Stateful instruction-following mode using specific instruction/response prefixes LLama/LLamaInstructExecutor.cs20-22

All executors implement the ILLamaExecutor interface and provide asynchronous streaming via InferAsync() LLama/Abstractions/ILLamaExecutor.cs40

Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaInteractExecutor.cs20-23 LLama/LLamaInstructExecutor.cs20-22 LLama/LLamaStatelessExecutor.cs16-19