![]() |
VOOZH | about |
This page provides an overview of the executor abstraction layer in LLamaSharp, which defines different patterns for text generation inference. Executors provide high-level APIs that manage context lifecycle, state persistence, and token streaming while interfacing with the lower-level LLamaContext and LLamaWeights classes.
The executor abstraction serves two primary purposes:
ILLamaExecutor.InferAsync(), providing a consistent async streaming API regardless of the underlying generation strategy LLama/Abstractions/ILLamaExecutor.cs10-41This page focuses on the architecture and common infrastructure shared across all executor types.
The following diagram bridges the functional roles to the specific C# entities.
Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaExecutorBase.cs20-126 LLama/LLamaInteractExecutor.cs23-52 LLama/LLamaInstructExecutor.cs22-61 LLama/LLamaStatelessExecutor.cs19-68
The ILLamaExecutor interface defines the contract all executors must implement LLama/Abstractions/ILLamaExecutor.cs10-41:
| Member | Type | Purpose |
|---|---|---|
Context | LLamaContext | The loaded context for inference operations LLama/Abstractions/ILLamaExecutor.cs15 |
IsMultiModal | bool | Indicates whether multimodal processing is enabled LLama/Abstractions/ILLamaExecutor.cs22 |
ClipModel | MtmdWeights? | Multimodal projection weights (MTMD) for vision/audio inputs LLama/Abstractions/ILLamaExecutor.cs26 |
Embeds | List<SafeMtmdEmbed> | Collection of processed media embeddings LLama/Abstractions/ILLamaExecutor.cs31 |
InferAsync() | IAsyncEnumerable<string> | Asynchronous streaming inference method LLama/Abstractions/ILLamaExecutor.cs40 |
The InferAsync() method provides the primary API surface, returning an async enumerable that yields decoded text chunks as they are generated.
Sources: LLama/Abstractions/ILLamaExecutor.cs10-41
This diagram illustrates how data flows differently between persistent sessions and one-shot requests.
Sources: LLama/LLamaExecutorBase.cs103-126 LLama/LLamaStatelessExecutor.cs58-81
| Aspect | Stateful (StatefulExecutorBase) | Stateless (StatelessExecutor) |
|---|---|---|
| Context Lifecycle | Single long-lived LLamaContext LLama/LLamaExecutorBase.cs107 | Fresh LLamaContext per InferAsync() call LLama/LLamaStatelessExecutor.cs79 |
| State Preservation | Maintains _pastTokensCount, _embed_inps, _last_n_tokens LLama/LLamaExecutorBase.cs29-61 | No state between requests LLama/LLamaStatelessExecutor.cs16-17 |
| KV Cache | Accumulates across turns to provide "memory" | Discarded after each request to save memory |
| Session Files | Supports WithSessionFile() for cache reuse LLama/LLamaExecutorBase.cs135 | Not applicable |
| Constructor | Takes LLamaContext LLama/LLamaExecutorBase.cs103 | Takes LLamaWeights and IContextParams LLama/LLamaStatelessExecutor.cs58 |
Sources: LLama/LLamaExecutorBase.cs20-135 LLama/LLamaStatelessExecutor.cs16-81
The abstract StatefulExecutorBase class maintains several critical state variables shared across InteractiveExecutor and InstructExecutor:
Sources: LLama/LLamaExecutorBase.cs29-61
StatefulExecutorBase defines abstract methods that subclasses must implement to customize behavior during the inference lifecycle (Template Method Pattern):
| Method | Return Type | Purpose |
|---|---|---|
GetLoopCondition(args, token) | Task<bool> | Determines whether to continue the generation loop LLama/LLamaInteractExecutor.cs121 |
PreprocessInputs(text, args, token) | Task | Tokenizes input and updates _embed_inps according to specific templates LLama/LLamaInteractExecutor.cs131 |
Sources: LLama/LLamaInteractExecutor.cs121-131 LLama/LLamaInstructExecutor.cs132-138
The following diagram shows how runtime executor state is bridged to persistent storage.
Sources: LLama/LLamaInteractExecutor.cs55-114 LLama/LLamaInstructExecutor.cs64-129 LLama/LLamaExecutorBase.cs135-154
When WithSessionFile(filename) is called, the executor attempts to load a saved llama.cpp session file using NativeApi.llama_state_load_file LLama/LLamaExecutorBase.cs135-147 It compares the tokens in the session file with the current prompt (_embed_inps) to determine _n_matching_session_tokens, allowing the executor to reuse prefix tokens without re-evaluation LLama/LLamaExecutorBase.cs161-170
Sources: LLama/LLamaExecutorBase.cs135-174
Both InteractiveExecutor and InstructExecutor support multimodal inputs when initialized with a MtmdWeights instance LLama/LLamaInteractExecutor.cs49 LLama/LLamaInstructExecutor.cs55
The IsMultiModal property returns true if ClipModel is not null LLama/LLamaExecutorBase.cs75-81 Multimodal inputs are processed via PreprocessMtmd, which handles image embeddings and special markers LLama/LLamaInteractExecutor.cs147
Sources: LLama/LLamaExecutorBase.cs75-84 LLama/LLamaInteractExecutor.cs49-147 LLama/LLamaInstructExecutor.cs55-61
The executor abstraction provides distinct inference patterns:
All executors implement the ILLamaExecutor interface and provide asynchronous streaming via InferAsync() LLama/Abstractions/ILLamaExecutor.cs40
Sources: LLama/Abstractions/ILLamaExecutor.cs10-41 LLama/LLamaInteractExecutor.cs20-23 LLama/LLamaInstructExecutor.cs20-22 LLama/LLamaStatelessExecutor.cs16-19
Refresh this wiki