![]() |
VOOZH | about |
This page documents the LLamaSharp.semantic-kernel NuGet package, which bridges LLamaSharp's local inference engine with Microsoft Semantic Kernel It covers the three SK service implementations, execution settings, history and output transforms, and the extension methods that convert between SK and LLamaSharp types.
For the analogous integration with Microsoft Kernel Memory (RAG / document indexing), see Kernel Memory Integration. For information about the underlying executors these services wrap, see Executors and Inference.
The integration lives in the LLama.SemanticKernel project and is published as the LLamaSharp.semantic-kernel NuGet package LLama.SemanticKernel/README.md1-3
| Property | Value |
|---|---|
| NuGet ID | LLamaSharp.semantic-kernel |
| Target Frameworks | netstandard2.0, net8.0 |
| SK dependency | Microsoft.SemanticKernel.Abstractions |
| LLamaSharp dependency | LLamaSharp (core) |
| Root namespace | LLamaSharp.SemanticKernel |
Sources: LLama.SemanticKernel/README.md1-38
The package maps three LLamaSharp capabilities onto the corresponding SK service interfaces.
Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15-50 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-21 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-19
LLamaSharpChatCompletionFile: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs
Implements SK's IChatCompletionService LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 Wraps any ILLamaExecutor and exposes both a batched and a streaming chat interface.
Constructor parameters:
| Parameter | Type | Default |
|---|---|---|
model | ILLamaExecutor | required |
defaultRequestSettings | LLamaSharpPromptExecutionSettings? | MaxTokens=256, Temperature=0, TopP=0 LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs27-36 |
historyTransform | IHistoryTransform? | HistoryTransform LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs46 |
outputTransform | ITextStreamTransform? | KeywordTextOutputStreamTransform stripping role prefixes LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49 |
Key methods:
| Method | SK Interface Method | Returns |
|---|---|---|
GetChatMessageContentsAsync | IChatCompletionService | Task<IReadOnlyList<ChatMessageContent>> LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-83 |
GetStreamingChatMessageContentsAsync | IChatCompletionService | IAsyncEnumerable<StreamingChatMessageContent> LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs86-101 |
CreateNewChat | (convenience) | ChatHistory with optional system message LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs52-62 |
Stateful vs. stateless prompt handling
LLamaSharpChatCompletion detects at construction time whether the injected executor is a StatefulExecutorBase LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs44
On the first turn with a stateful executor (state.IsPromptRun == true), the full ChatHistory is serialized and submitted LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs115-118 On subsequent turns the executor already holds the conversation state in its KV cache, so only the most recent user message is sent LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs120-124 This prevents the full history from being re-processed on every turn.
Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs109-132
LLamaSharpTextCompletionFile: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs
Implements SK's ITextGenerationService LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 Takes any ILLamaExecutor and forwards the raw prompt string directly to InferAsync.
Constructor:
LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs17-20
Key methods:
| Method | Behavior |
|---|---|
GetTextContentsAsync | Accumulates all tokens from InferAsync into a single TextContent LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs23-33 |
GetStreamingTextContentsAsync | Yields each token from InferAsync as a StreamingTextContent LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs36-44 |
Unlike LLamaSharpChatCompletion, this service performs no history transformation — the caller is responsible for formatting the prompt LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs26-39
Sources: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-45
LLamaSharpEmbeddingGenerationFile: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs
Implements SK's ITextEmbeddingGenerationService LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs8-9 Wraps a LLamaEmbedder instance.
Constructor:
LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs16-19
Key method:
GenerateEmbeddingsAsync(IList<string> data) iterates through each input string and calls _embedder.GetEmbeddings, returning results as IList<ReadOnlyMemory<float>> LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs22-30 Processing is sequential LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs26-27
Sources: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-31
LLamaSharpPromptExecutionSettingsFile: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs
Extends SK's PromptExecutionSettings LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 This is the primary settings type for all three services.
| Property | Type | Description |
|---|---|---|
Temperature | double | Sampling temperature LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs14 |
TopP | double | Nucleus sampling probability LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs21 |
PresencePenalty | double | Presence penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs29 |
FrequencyPenalty | double | Frequency penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs37 |
MaxTokens | int? | Maximum tokens to generate LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs57 |
StopSequences | IList<string> | Additional stop sequences LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs43 |
ResponseFormat | string | Format for post-processing (e.g., json_object) LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs68-69 |
The static method LLamaSharpPromptExecutionSettings.FromRequestSettings(PromptExecutionSettings?) deserializes any SK PromptExecutionSettings into LLamaSharpPromptExecutionSettings via JSON round-trip using a custom converter LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs77-108
Deprecated:
ChatRequestSettingswas the previous settings class. It is still present in LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs and LLama.SemanticKernel/ChatCompletion/ChatRequestSettingsConverter.cs but is marked[Obsolete]LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs7-8 UseLLamaSharpPromptExecutionSettingsinstead.
Sources: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7-109 LLama.SemanticKernel/LLamaSharpPromptExecutionSettingsConverter.cs1-11
File: LLama.SemanticKernel/ExtensionMethods.cs
The ExtensionMethods static class provides two conversions:
ToLLamaSharpChatHistoryLLama.SemanticKernel/ExtensionMethods.cs9-27
Iterates over each message in the SK ChatHistory and maps chat.Role.Label to a LLama.Common.AuthorRole enum value LLama.SemanticKernel/ExtensionMethods.cs18-20 Unknown roles default to AuthorRole.Unknown LLama.SemanticKernel/ExtensionMethods.cs21
ToLLamaSharpInferenceParams (internal)LLama.SemanticKernel/ExtensionMethods.cs34-60
Constructs an InferenceParams from execution settings. The anti-prompt list always includes:
StopSequences LLama.SemanticKernel/ExtensionMethods.cs41"User:", "Assistant:", "System:" (to prevent the model from generating role prefixes) LLama.SemanticKernel/ExtensionMethods.cs43-45The SamplingPipeline is set to a new DefaultSamplingPipeline configured with Temperature, TopP, PresencePenalty, and FrequencyPenalty from the settings object LLama.SemanticKernel/ExtensionMethods.cs52-59
Sources: LLama.SemanticKernel/ExtensionMethods.cs1-61
HistoryTransformFile: LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs
Extends LLamaTransforms.DefaultHistoryTransform. The sole override appends "{AuthorRole.Assistant}: " to the end of the serialized history string LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs12-16 This primes the model to generate its response in the assistant role position.
LLamaSharpChatCompletion defaults to a KeywordTextOutputStreamTransform configured to strip the strings "User:", "Assistant:", and "System:" from the output stream LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49 This prevents role-prefix tokens that the model may generate from leaking into the final response.
Both transforms can be replaced at construction time by passing custom IHistoryTransform and ITextStreamTransform implementations to LLamaSharpChatCompletion LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs38-41
End-to-end data flow for LLamaSharpChatCompletion
Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-101 LLama.SemanticKernel/ExtensionMethods.cs34-60
The typical setup registers services with the SK Kernel builder.
LLamaWeights, LLamaContext, and an executor (e.g., StatelessExecutor LLama.Examples/Examples/SemanticKernelChat.cs20-23 or InteractiveExecutor LLama.SemanticKernel/README.md25).LLamaSharpChatCompletion LLama.Examples/Examples/SemanticKernelChat.cs22 or LLamaSharpTextCompletion LLama.Examples/Examples/SemanticKernelPrompt.cs26builder.Services.AddKeyedSingleton<ITextGenerationService>(...) LLama.Examples/Examples/SemanticKernelPrompt.cs25-26LLamaEmbedder and wrap it in LLamaSharpEmbeddingGeneration LLama.SemanticKernel/README.md31-35Sources: LLama.Examples/Examples/SemanticKernelChat.cs18-22 LLama.Examples/Examples/SemanticKernelPrompt.cs21-28 LLama.SemanticKernel/README.md11-38
| Type | File | SK Interface | LLamaSharp Dependency |
|---|---|---|---|
LLamaSharpChatCompletion | ChatCompletion/LLamaSharpChatCompletion.cs | IChatCompletionService | ILLamaExecutor |
LLamaSharpTextCompletion | TextCompletion/LLamaSharpTextCompletion.cs | ITextGenerationService | ILLamaExecutor |
LLamaSharpEmbeddingGeneration | TextEmbedding/LLamaSharpEmbeddingGeneration.cs | ITextEmbeddingGenerationService | LLamaEmbedder |
LLamaSharpPromptExecutionSettings | LLamaSharpPromptExecutionSettings.cs | extends PromptExecutionSettings | — |
HistoryTransform | ChatCompletion/HistoryTransform.cs | — | DefaultHistoryTransform |
ExtensionMethods | ExtensionMethods.cs | — | ChatHistory, InferenceParams |
ChatRequestSettings | ChatCompletion/ChatRequestSettings.cs | Deprecated | — |
Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7 LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs9 LLama.SemanticKernel/ExtensionMethods.cs7 LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs8
Refresh this wiki