Last indexed: 18 May 2026 (ecd184)

Chat Sessions

This page documents the ChatSession high-level API, which composes an executor, conversation history, and text transform pipeline into a unified conversational interface. It covers how history is managed, how prompts are formatted for specific models using templates, how input and output are transformed, and how session state is serialized and restored.

For the underlying stateful executors that ChatSession wraps, see Stateful Executors. For inference parameters (temperature, anti-prompts, etc.) that are passed through to the executor, see Inference Parameters.

Overview

ChatSession sits at the top of the LLamaSharp executor stack. It does not perform inference itself; instead, it orchestrates:

A StatefulExecutorBase-derived executor (e.g., InteractiveExecutor) LLama/ChatSession.cs51
A ChatHistory holding the conversation transcript LLama/ChatSession.cs56
An IHistoryTransform that serializes history into a prompt string LLama/ChatSession.cs61
A list of ITextTransform objects applied to user input before it reaches the executor LLama/ChatSession.cs66
An ITextStreamTransform applied to the token stream the executor produces LLama/ChatSession.cs71

Component Interaction Diagram

Sources: LLama/ChatSession.cs19-72

ChatSession Construction

ChatSession requires a StatefulExecutorBase-derived executor. Passing any other ILLamaExecutor implementation (such as StatelessExecutor) throws an ArgumentException LLama/ChatSession.cs105-108

Constructor / Factory	Description
`ChatSession(ILLamaExecutor executor)`	Basic construction with empty history LLama/ChatSession.cs102-111
`ChatSession(ILLamaExecutor executor, ChatHistory history)`	Supplies a pre-built history LLama/ChatSession.cs118-122
`ChatSession.InitializeSessionFromHistoryAsync(...)`	Async factory that also prefills the KV cache via `PrefillPromptAsync` LLama/ChatSession.cs81-96

The static factory InitializeSessionFromHistoryAsync is useful when you want the model to have already "seen" a conversation before the first user turn, avoiding the cost of reprocessing history on the first turn LLama/ChatSession.cs94

Fluent configuration methods modify the session in-place and return this:

Method	Effect
`WithHistoryTransform(IHistoryTransform)`	Replaces the history-to-prompt serializer LLama/ChatSession.cs129-133
`AddInputTransform(ITextTransform)`	Appends a transform to the input pipeline LLama/ChatSession.cs140-144
`WithOutputTransform(ITextStreamTransform)`	Replaces the output stream transform LLama/ChatSession.cs150-155

Sources: LLama/ChatSession.cs74-155

ChatHistory and AuthorRole

ChatHistory holds an ordered list of ChatHistory.Message objects, each with an AuthorRole and a Content string LLama/Common/ChatHistory.cs38-76

Data Structure Space to Code Entity Space

Sources: LLama/Common/ChatHistory.cs9-76

AuthorRole defines the identity of the message source, supporting System (0), User (1), and Assistant (2) LLama/Common/ChatHistory.cs11-32

Prompt Templating and History Transformation

Modern models (LLama 3, Mistral, etc.) require specific formatting (e.g., <|im_start|>user\n...<|im_end|>) to distinguish roles. LLamaSharp provides history transformation to interface with these requirements.

History Transformation: The IHistoryTransform interface defines how a ChatHistory object is flattened into a string the model can understand LLama/Abstractions/IHistoryTransform.cs10-33
Default Formatting: DefaultHistoryTransform uses a simple [Author]: [Message] pattern LLama/LLamaTransforms.cs66-89
Role Name Handling: Role names can be customized in the DefaultHistoryTransform constructor to match model expectations (e.g., "User", "Assistant", "System") LLama/LLamaTransforms.cs49-57

Prompt Generation Pipeline

Sources: LLama/LLamaTransforms.cs66-89 LLama/ChatSession.cs94 LLama/Abstractions/IHistoryTransform.cs17

Performing Inference

ChatAsync

ChatAsync is the primary entry point for a single conversational turn. It takes a new ChatHistory.Message, appends it to the history, and returns an IAsyncEnumerable<string> streaming tokens LLama/ChatSession.cs256-271

Regenerating the Last Response

RegenerateAssistantMessageAsync allows the model to attempt a different response to the same prompt. It removes the last assistant message from history and re-runs the inference LLama/ChatSession.cs280-292

Manual Context Seeding

AddAndProcessUserMessage and AddAndProcessAssistantMessage allow manually adding context to the session and processing it into the KV cache without triggering a full generation cycle LLama/ChatSession.cs338-356

Sources: LLama/ChatSession.cs256-356

Transform Interfaces

IHistoryTransform

Converts between ChatHistory and a raw prompt string.

DefaultHistoryTransform: Uses a simple [Author]: [Message] format LLama/LLamaTransforms.cs20-57 It can also trim these names from model output using TrimNamesFromText LLama/LLamaTransforms.cs105-120

ITextTransform

Synchronous transform of a single input string. Used in InputTransformPipeline LLama/ChatSession.cs66

NaiveTextInputTransform: Simple implementation that trims whitespace LLama/LLamaTransforms.cs126-140

ITextStreamTransform

Asynchronously transforms the stream of strings produced during inference.

KeywordTextOutputStreamTransform: Removes specified keywords (like "User:" or "Assistant:") from the output to prevent the model from hallucinating the next turn LLama/LLamaTransforms.cs164-188
EmptyTextOutputStreamTransform: A no-op transform that returns the stream unchanged LLama/LLamaTransforms.cs145-159

Sources: LLama/LLamaTransforms.cs20-188 LLama/Abstractions/ITextTransform.cs1-31 LLama/Abstractions/ITextStreamTransform.cs1-26

Session State Serialization

ChatSession supports full persistence of the conversation, including the model's KV cache and executor state.

SessionState Files

When calling SaveSession(string path), the following files are created in the directory LLama/ChatSession.cs26-46:

Constant	Filename	Description
`MODEL_STATE_FILENAME`	`ModelState.st`	Binary KV cache / Context state LLama/ChatSession.cs26
`EXECUTOR_STATE_FILENAME`	`ExecutorState.json`	Internal executor counters and state LLama/ChatSession.cs30
`HISTORY_STATE_FILENAME`	`ChatHistory.json`	The JSON-serialized `ChatHistory` LLama/ChatSession.cs34
`INPUT_TRANSFORM_FILENAME`	`InputTransform.json`	Serialized input pipeline LLama/ChatSession.cs38
`OUTPUT_TRANSFORM_FILENAME`	`OutputTransform.json`	Serialized output transform LLama/ChatSession.cs42
`HISTORY_TRANSFORM_FILENAME`	`HistoryTransform.json`	Serialized history formatter LLama/ChatSession.cs46

State Management Methods

SaveSession(string path): Persists the entire session to a folder LLama/ChatSession.cs163-166
LoadSession(string path): Restores context, history, and transforms from a folder LLama/ChatSession.cs220-224
GetSessionState(): Captures the current state into an in-memory SessionState object for quick resets LLama/ChatSession.cs172-183

Sources: LLama/ChatSession.cs163-224 LLama.Examples/Examples/ChatSessionWithHistory.cs65-77 LLama.Examples/Examples/ChatSessionWithRestart.cs27-30

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/3.4-chat-sessions

⇱ Chat Sessions | SciSharp/LLamaSharp | DeepWiki