Last indexed: 18 May 2026 (ecd184)

State Persistence

State persistence allows saving and restoring the internal memory state of a LLamaContext, including the KV cache (key-value cache) that stores processed tokens and their embeddings. This enables resuming inference from a previous point without reprocessing tokens, creating checkpoints during long conversations, and implementing branching conversation paths.

LLamaSharp provides persistence at multiple levels:

Full context state: Saves all sequences and memory.
Sequence-specific state: Saves a single sequence's memory.
Executor state: Saves token counters and tracking buffers for a specific inference mode (e.g., InteractiveExecutor).
Session files: Native llama.cpp format for fast KV cache warm-up.

Core State Persistence APIs

The LLamaContext class exposes state persistence through paired save/load methods operating at two levels: managed file mapping and native handle manipulation.

Context State Persistence Architecture

Sources: LLama/LLamaContext.cs129-234 LLama/Native/NativeApi.cs107-133

Full Context State

Saving to File

The LLamaContext provides methods to persist the entire context state (all sequences and KV cache) to a file. The SaveState(string filename) method uses MemoryMappedFiles to write the binary state directly from native memory to disk, avoiding expensive byte-array copies in managed memory LLama/LLamaContext.cs133-163 It calculates the exact size required by calling NativeHandle.GetStateSize() LLama/LLamaContext.cs140

Loading from File

The LoadState(string filename) method restores the full context state. It maps the file from disk and calls NativeHandle.SetState(ptr, size) to populate the native context memory LLama/LLamaContext.cs201-234

Sequence-Specific State

LLamaSharp supports saving the state of a specific sequence (identified by LLamaSeqId). This is useful in batched scenarios where only one branch of a conversation needs to be persisted LLama/LLamaContext.cs170-200

Sources: LLama/LLamaContext.cs133-234 LLama/Native/SafeLLamaContextHandle.cs109-122

Executor State Persistence

Executors like InteractiveExecutor and InstructExecutor maintain managed state (like token counts and input buffers) that must be persisted alongside the native KV cache to ensure the executor's internal pointers match the context state.

Implementation in StatefulExecutorBase

StatefulExecutorBase defines the structure for stateful executors. Concrete implementations must provide logic for serializing their internal state data LLama/LLamaExecutorBase.cs20-21

GetStateData(): Captures the current executor state (consumed tokens, past token counts, session tokens) into a serializable object like InteractiveExecutorState LLama/LLamaInteractExecutor.cs55-72
SaveState(string filename): Serializes the state data to a JSON file LLama/LLamaInteractExecutor.cs98-105
LoadState(string filename): Deserializes the JSON and restores the executor's internal fields LLama/LLamaInteractExecutor.cs108-114

Executor State Entity Association

Sources: LLama/LLamaInteractExecutor.cs55-114 LLama/LLamaInstructExecutor.cs64-129 LLama/LLamaExecutorBase.cs20-114

KV Cache Warm-up and Session Files

LLamaSharp supports native session files (.session) which are handled directly by the underlying llama.cpp implementation. These are optimized for fast context warm-up by reusing prefix tokens already present in the file.

Native Session APIs

The NativeApi provides direct wrappers for session file management:

llama_state_load_file: Loads a session file into a context and returns the count of tokens loaded LLama/Native/NativeApi.cs107-109
llama_state_save_file: Saves the current context tokens and KV state to a session file LLama/Native/NativeApi.cs119-121

Sequence Session APIs

For finer control in multi-sequence environments, llama_state_seq_save_file and llama_state_seq_load_file allow persisting specific sequence IDs to disk LLama/Native/NativeApi.cs127-133

Integration in StatefulExecutor

The StatefulExecutorBase includes a WithSessionFile(string filename) method. If the file exists, it attempts to load it using NativeApi.llama_state_load_file to "warm up" the KV cache with tokens from the previous session LLama/LLamaExecutorBase.cs135-154

Sources: LLama/Native/NativeApi.cs107-133 LLama/LLamaExecutorBase.cs135-178

Usage Example: Saving and Loading

The following workflow demonstrates the relationship between context persistence and executor persistence.

Session Persistence Data Flow

In code, this is typically implemented by saving both the binary context state and the JSON executor state:

ex.Context.SaveState(modelStatePath) docs/Examples/LoadAndSaveState.md48
await ex.SaveState(executorStatePath) docs/Examples/LoadAndSaveState.md52

Sources: docs/Examples/LoadAndSaveState.md44-70 LLama/LLamaContext.cs133-234

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/3.5-state-persistence