![]() |
VOOZH | about |
State persistence allows saving and restoring the internal memory state of a LLamaContext, including the KV cache (key-value cache) that stores processed tokens and their embeddings. This enables resuming inference from a previous point without reprocessing tokens, creating checkpoints during long conversations, and implementing branching conversation paths.
LLamaSharp provides persistence at multiple levels:
InteractiveExecutor).llama.cpp format for fast KV cache warm-up.The LLamaContext class exposes state persistence through paired save/load methods operating at two levels: managed file mapping and native handle manipulation.
Context State Persistence Architecture
Sources: LLama/LLamaContext.cs129-234 LLama/Native/NativeApi.cs107-133
The LLamaContext provides methods to persist the entire context state (all sequences and KV cache) to a file. The SaveState(string filename) method uses MemoryMappedFiles to write the binary state directly from native memory to disk, avoiding expensive byte-array copies in managed memory LLama/LLamaContext.cs133-163 It calculates the exact size required by calling NativeHandle.GetStateSize() LLama/LLamaContext.cs140
The LoadState(string filename) method restores the full context state. It maps the file from disk and calls NativeHandle.SetState(ptr, size) to populate the native context memory LLama/LLamaContext.cs201-234
LLamaSharp supports saving the state of a specific sequence (identified by LLamaSeqId). This is useful in batched scenarios where only one branch of a conversation needs to be persisted LLama/LLamaContext.cs170-200
Sources: LLama/LLamaContext.cs133-234 LLama/Native/SafeLLamaContextHandle.cs109-122
Executors like InteractiveExecutor and InstructExecutor maintain managed state (like token counts and input buffers) that must be persisted alongside the native KV cache to ensure the executor's internal pointers match the context state.
StatefulExecutorBase defines the structure for stateful executors. Concrete implementations must provide logic for serializing their internal state data LLama/LLamaExecutorBase.cs20-21
InteractiveExecutorState LLama/LLamaInteractExecutor.cs55-72Executor State Entity Association
Sources: LLama/LLamaInteractExecutor.cs55-114 LLama/LLamaInstructExecutor.cs64-129 LLama/LLamaExecutorBase.cs20-114
LLamaSharp supports native session files (.session) which are handled directly by the underlying llama.cpp implementation. These are optimized for fast context warm-up by reusing prefix tokens already present in the file.
The NativeApi provides direct wrappers for session file management:
llama_state_load_file: Loads a session file into a context and returns the count of tokens loaded LLama/Native/NativeApi.cs107-109llama_state_save_file: Saves the current context tokens and KV state to a session file LLama/Native/NativeApi.cs119-121For finer control in multi-sequence environments, llama_state_seq_save_file and llama_state_seq_load_file allow persisting specific sequence IDs to disk LLama/Native/NativeApi.cs127-133
The StatefulExecutorBase includes a WithSessionFile(string filename) method. If the file exists, it attempts to load it using NativeApi.llama_state_load_file to "warm up" the KV cache with tokens from the previous session LLama/LLamaExecutorBase.cs135-154
Sources: LLama/Native/NativeApi.cs107-133 LLama/LLamaExecutorBase.cs135-178
The following workflow demonstrates the relationship between context persistence and executor persistence.
Session Persistence Data Flow
In code, this is typically implemented by saving both the binary context state and the JSON executor state:
ex.Context.SaveState(modelStatePath) docs/Examples/LoadAndSaveState.md48await ex.SaveState(executorStatePath) docs/Examples/LoadAndSaveState.md52Sources: docs/Examples/LoadAndSaveState.md44-70 LLama/LLamaContext.cs133-234
Refresh this wiki