Last indexed: 18 May 2026 (ecd184)

Semantic Kernel Integration

This page documents the LLamaSharp.semantic-kernel NuGet package, which bridges LLamaSharp's local inference engine with Microsoft Semantic Kernel It covers the three SK service implementations, execution settings, history and output transforms, and the extension methods that convert between SK and LLamaSharp types.

For the analogous integration with Microsoft Kernel Memory (RAG / document indexing), see Kernel Memory Integration. For information about the underlying executors these services wrap, see Executors and Inference.

Package Overview

The integration lives in the LLama.SemanticKernel project and is published as the LLamaSharp.semantic-kernel NuGet package LLama.SemanticKernel/README.md1-3

Property	Value
NuGet ID	`LLamaSharp.semantic-kernel`
Target Frameworks	`netstandard2.0`, `net8.0`
SK dependency	`Microsoft.SemanticKernel.Abstractions`
LLamaSharp dependency	`LLamaSharp` (core)
Root namespace	`LLamaSharp.SemanticKernel`

Sources: LLama.SemanticKernel/README.md1-38

Architecture

The package maps three LLamaSharp capabilities onto the corresponding SK service interfaces.

Service Mapping Overview

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15-50 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-21 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-19

Service Implementations

`LLamaSharpChatCompletion`

File: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs

Implements SK's IChatCompletionService LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 Wraps any ILLamaExecutor and exposes both a batched and a streaming chat interface.

Constructor parameters:

Parameter	Type	Default
`model`	`ILLamaExecutor`	required
`defaultRequestSettings`	`LLamaSharpPromptExecutionSettings?`	`MaxTokens=256, Temperature=0, TopP=0` LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs27-36
`historyTransform`	`IHistoryTransform?`	`HistoryTransform` LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs46
`outputTransform`	`ITextStreamTransform?`	`KeywordTextOutputStreamTransform` stripping role prefixes LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49

Key methods:

Method	SK Interface Method	Returns
`GetChatMessageContentsAsync`	`IChatCompletionService`	`Task<IReadOnlyList<ChatMessageContent>>` LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-83
`GetStreamingChatMessageContentsAsync`	`IChatCompletionService`	`IAsyncEnumerable<StreamingChatMessageContent>` LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs86-101
`CreateNewChat`	(convenience)	`ChatHistory` with optional system message LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs52-62

Stateful vs. stateless prompt handling

LLamaSharpChatCompletion detects at construction time whether the injected executor is a StatefulExecutorBase LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs44

On the first turn with a stateful executor (state.IsPromptRun == true), the full ChatHistory is serialized and submitted LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs115-118 On subsequent turns the executor already holds the conversation state in its KV cache, so only the most recent user message is sent LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs120-124 This prevents the full history from being re-processed on every turn.

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs109-132

`LLamaSharpTextCompletion`

File: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs

Implements SK's ITextGenerationService LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 Takes any ILLamaExecutor and forwards the raw prompt string directly to InferAsync.

Constructor:

LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs17-20

Key methods:

Method	Behavior
`GetTextContentsAsync`	Accumulates all tokens from `InferAsync` into a single `TextContent` LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs23-33
`GetStreamingTextContentsAsync`	Yields each token from `InferAsync` as a `StreamingTextContent` LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs36-44

Unlike LLamaSharpChatCompletion, this service performs no history transformation — the caller is responsible for formatting the prompt LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs26-39

Sources: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-45

`LLamaSharpEmbeddingGeneration`

File: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs

Implements SK's ITextEmbeddingGenerationService LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs8-9 Wraps a LLamaEmbedder instance.

Constructor:

LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs16-19

Key method:

GenerateEmbeddingsAsync(IList<string> data) iterates through each input string and calls _embedder.GetEmbeddings, returning results as IList<ReadOnlyMemory<float>> LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs22-30 Processing is sequential LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs26-27

Sources: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-31

Execution Settings

`LLamaSharpPromptExecutionSettings`

File: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs

Extends SK's PromptExecutionSettings LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 This is the primary settings type for all three services.

Property	Type	Description
`Temperature`	`double`	Sampling temperature LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs14
`TopP`	`double`	Nucleus sampling probability LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs21
`PresencePenalty`	`double`	Presence penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs29
`FrequencyPenalty`	`double`	Frequency penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs37
`MaxTokens`	`int?`	Maximum tokens to generate LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs57
`StopSequences`	`IList<string>`	Additional stop sequences LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs43
`ResponseFormat`	`string`	Format for post-processing (e.g., `json_object`) LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs68-69

The static method LLamaSharpPromptExecutionSettings.FromRequestSettings(PromptExecutionSettings?) deserializes any SK PromptExecutionSettings into LLamaSharpPromptExecutionSettings via JSON round-trip using a custom converter LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs77-108

Deprecated: ChatRequestSettings was the previous settings class. It is still present in LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs and LLama.SemanticKernel/ChatCompletion/ChatRequestSettingsConverter.cs but is marked [Obsolete] LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs7-8 Use LLamaSharpPromptExecutionSettings instead.

Sources: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7-109 LLama.SemanticKernel/LLamaSharpPromptExecutionSettingsConverter.cs1-11

Extension Methods

File: LLama.SemanticKernel/ExtensionMethods.cs

The ExtensionMethods static class provides two conversions:

`ToLLamaSharpChatHistory`

LLama.SemanticKernel/ExtensionMethods.cs9-27

Iterates over each message in the SK ChatHistory and maps chat.Role.Label to a LLama.Common.AuthorRole enum value LLama.SemanticKernel/ExtensionMethods.cs18-20 Unknown roles default to AuthorRole.Unknown LLama.SemanticKernel/ExtensionMethods.cs21

`ToLLamaSharpInferenceParams` (internal)

LLama.SemanticKernel/ExtensionMethods.cs34-60

Constructs an InferenceParams from execution settings. The anti-prompt list always includes:

All entries from StopSequences LLama.SemanticKernel/ExtensionMethods.cs41
"User:", "Assistant:", "System:" (to prevent the model from generating role prefixes) LLama.SemanticKernel/ExtensionMethods.cs43-45

The SamplingPipeline is set to a new DefaultSamplingPipeline configured with Temperature, TopP, PresencePenalty, and FrequencyPenalty from the settings object LLama.SemanticKernel/ExtensionMethods.cs52-59

Sources: LLama.SemanticKernel/ExtensionMethods.cs1-61

History and Output Transforms

`HistoryTransform`

File: LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs

Extends LLamaTransforms.DefaultHistoryTransform. The sole override appends "{AuthorRole.Assistant}: " to the end of the serialized history string LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs12-16 This primes the model to generate its response in the assistant role position.

Output transform (default)

LLamaSharpChatCompletion defaults to a KeywordTextOutputStreamTransform configured to strip the strings "User:", "Assistant:", and "System:" from the output stream LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49 This prevents role-prefix tokens that the model may generate from leaking into the final response.

Both transforms can be replaced at construction time by passing custom IHistoryTransform and ITextStreamTransform implementations to LLamaSharpChatCompletion LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs38-41

Data Flow: Chat Completion

End-to-end data flow for LLamaSharpChatCompletion

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-101 LLama.SemanticKernel/ExtensionMethods.cs34-60

Usage Pattern

The typical setup registers services with the SK Kernel builder.

Create LLamaSharp objects — LLamaWeights, LLamaContext, and an executor (e.g., StatelessExecutor LLama.Examples/Examples/SemanticKernelChat.cs20-23 or InteractiveExecutor LLama.SemanticKernel/README.md25).
Construct the SK service — pass the executor to LLamaSharpChatCompletion LLama.Examples/Examples/SemanticKernelChat.cs22 or LLamaSharpTextCompletion LLama.Examples/Examples/SemanticKernelPrompt.cs26
Register with the kernel — use builder.Services.AddKeyedSingleton<ITextGenerationService>(...) LLama.Examples/Examples/SemanticKernelPrompt.cs25-26
For embeddings — construct a LLamaEmbedder and wrap it in LLamaSharpEmbeddingGeneration LLama.SemanticKernel/README.md31-35

Sources: LLama.Examples/Examples/SemanticKernelChat.cs18-22 LLama.Examples/Examples/SemanticKernelPrompt.cs21-28 LLama.SemanticKernel/README.md11-38

Summary of Types

Type	File	SK Interface	LLamaSharp Dependency
`LLamaSharpChatCompletion`	`ChatCompletion/LLamaSharpChatCompletion.cs`	`IChatCompletionService`	`ILLamaExecutor`
`LLamaSharpTextCompletion`	`TextCompletion/LLamaSharpTextCompletion.cs`	`ITextGenerationService`	`ILLamaExecutor`
`LLamaSharpEmbeddingGeneration`	`TextEmbedding/LLamaSharpEmbeddingGeneration.cs`	`ITextEmbeddingGenerationService`	`LLamaEmbedder`
`LLamaSharpPromptExecutionSettings`	`LLamaSharpPromptExecutionSettings.cs`	extends `PromptExecutionSettings`	—
`HistoryTransform`	`ChatCompletion/HistoryTransform.cs`	—	`DefaultHistoryTransform`
`ExtensionMethods`	`ExtensionMethods.cs`	—	`ChatHistory`, `InferenceParams`
`ChatRequestSettings`	`ChatCompletion/ChatRequestSettings.cs`	Deprecated	—

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7 LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs9 LLama.SemanticKernel/ExtensionMethods.cs7 LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs8

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/7.1-semantic-kernel-integration