VOOZH about

URL: https://deepwiki.com/SciSharp/LLamaSharp/7.1-semantic-kernel-integration

⇱ Semantic Kernel Integration | SciSharp/LLamaSharp | DeepWiki


Loading...
Last indexed: 18 May 2026 (ecd184)
Menu

Semantic Kernel Integration

This page documents the LLamaSharp.semantic-kernel NuGet package, which bridges LLamaSharp's local inference engine with Microsoft Semantic Kernel It covers the three SK service implementations, execution settings, history and output transforms, and the extension methods that convert between SK and LLamaSharp types.

For the analogous integration with Microsoft Kernel Memory (RAG / document indexing), see Kernel Memory Integration. For information about the underlying executors these services wrap, see Executors and Inference.


Package Overview

The integration lives in the LLama.SemanticKernel project and is published as the LLamaSharp.semantic-kernel NuGet package LLama.SemanticKernel/README.md1-3

PropertyValue
NuGet IDLLamaSharp.semantic-kernel
Target Frameworksnetstandard2.0, net8.0
SK dependencyMicrosoft.SemanticKernel.Abstractions
LLamaSharp dependencyLLamaSharp (core)
Root namespaceLLamaSharp.SemanticKernel

Sources: LLama.SemanticKernel/README.md1-38


Architecture

The package maps three LLamaSharp capabilities onto the corresponding SK service interfaces.

Service Mapping Overview


Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15-50 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-21 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-19


Service Implementations

LLamaSharpChatCompletion

File: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs

Implements SK's IChatCompletionService LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 Wraps any ILLamaExecutor and exposes both a batched and a streaming chat interface.

Constructor parameters:

ParameterTypeDefault
modelILLamaExecutorrequired
defaultRequestSettingsLLamaSharpPromptExecutionSettings?MaxTokens=256, Temperature=0, TopP=0 LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs27-36
historyTransformIHistoryTransform?HistoryTransform LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs46
outputTransformITextStreamTransform?KeywordTextOutputStreamTransform stripping role prefixes LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49

Key methods:

MethodSK Interface MethodReturns
GetChatMessageContentsAsyncIChatCompletionServiceTask<IReadOnlyList<ChatMessageContent>> LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-83
GetStreamingChatMessageContentsAsyncIChatCompletionServiceIAsyncEnumerable<StreamingChatMessageContent> LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs86-101
CreateNewChat(convenience)ChatHistory with optional system message LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs52-62

Stateful vs. stateless prompt handling

LLamaSharpChatCompletion detects at construction time whether the injected executor is a StatefulExecutorBase LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs44


On the first turn with a stateful executor (state.IsPromptRun == true), the full ChatHistory is serialized and submitted LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs115-118 On subsequent turns the executor already holds the conversation state in its KV cache, so only the most recent user message is sent LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs120-124 This prevents the full history from being re-processed on every turn.

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs109-132


LLamaSharpTextCompletion

File: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs

Implements SK's ITextGenerationService LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 Takes any ILLamaExecutor and forwards the raw prompt string directly to InferAsync.

Constructor:


LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs17-20

Key methods:

MethodBehavior
GetTextContentsAsyncAccumulates all tokens from InferAsync into a single TextContent LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs23-33
GetStreamingTextContentsAsyncYields each token from InferAsync as a StreamingTextContent LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs36-44

Unlike LLamaSharpChatCompletion, this service performs no history transformation — the caller is responsible for formatting the prompt LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs26-39

Sources: LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9-45


LLamaSharpEmbeddingGeneration

File: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs

Implements SK's ITextEmbeddingGenerationService LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs8-9 Wraps a LLamaEmbedder instance.

Constructor:


LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs16-19

Key method:

GenerateEmbeddingsAsync(IList<string> data) iterates through each input string and calls _embedder.GetEmbeddings, returning results as IList<ReadOnlyMemory<float>> LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs22-30 Processing is sequential LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs26-27

Sources: LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7-31


Execution Settings

LLamaSharpPromptExecutionSettings

File: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs

Extends SK's PromptExecutionSettings LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 This is the primary settings type for all three services.

PropertyTypeDescription
TemperaturedoubleSampling temperature LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs14
TopPdoubleNucleus sampling probability LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs21
PresencePenaltydoublePresence penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs29
FrequencyPenaltydoubleFrequency penalty LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs37
MaxTokensint?Maximum tokens to generate LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs57
StopSequencesIList<string>Additional stop sequences LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs43
ResponseFormatstringFormat for post-processing (e.g., json_object) LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs68-69

The static method LLamaSharpPromptExecutionSettings.FromRequestSettings(PromptExecutionSettings?) deserializes any SK PromptExecutionSettings into LLamaSharpPromptExecutionSettings via JSON round-trip using a custom converter LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs77-108

Deprecated: ChatRequestSettings was the previous settings class. It is still present in LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs and LLama.SemanticKernel/ChatCompletion/ChatRequestSettingsConverter.cs but is marked [Obsolete] LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs7-8 Use LLamaSharpPromptExecutionSettings instead.

Sources: LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7-109 LLama.SemanticKernel/LLamaSharpPromptExecutionSettingsConverter.cs1-11


Extension Methods

File: LLama.SemanticKernel/ExtensionMethods.cs

The ExtensionMethods static class provides two conversions:

ToLLamaSharpChatHistory


LLama.SemanticKernel/ExtensionMethods.cs9-27

Iterates over each message in the SK ChatHistory and maps chat.Role.Label to a LLama.Common.AuthorRole enum value LLama.SemanticKernel/ExtensionMethods.cs18-20 Unknown roles default to AuthorRole.Unknown LLama.SemanticKernel/ExtensionMethods.cs21

ToLLamaSharpInferenceParams (internal)


LLama.SemanticKernel/ExtensionMethods.cs34-60

Constructs an InferenceParams from execution settings. The anti-prompt list always includes:

The SamplingPipeline is set to a new DefaultSamplingPipeline configured with Temperature, TopP, PresencePenalty, and FrequencyPenalty from the settings object LLama.SemanticKernel/ExtensionMethods.cs52-59

Sources: LLama.SemanticKernel/ExtensionMethods.cs1-61


History and Output Transforms

HistoryTransform

File: LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs

Extends LLamaTransforms.DefaultHistoryTransform. The sole override appends "{AuthorRole.Assistant}: " to the end of the serialized history string LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs12-16 This primes the model to generate its response in the assistant role position.

Output transform (default)

LLamaSharpChatCompletion defaults to a KeywordTextOutputStreamTransform configured to strip the strings "User:", "Assistant:", and "System:" from the output stream LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs47-49 This prevents role-prefix tokens that the model may generate from leaking into the final response.

Both transforms can be replaced at construction time by passing custom IHistoryTransform and ITextStreamTransform implementations to LLamaSharpChatCompletion LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs38-41


Data Flow: Chat Completion

End-to-end data flow for LLamaSharpChatCompletion


Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs65-101 LLama.SemanticKernel/ExtensionMethods.cs34-60


Usage Pattern

The typical setup registers services with the SK Kernel builder.

  1. Create LLamaSharp objectsLLamaWeights, LLamaContext, and an executor (e.g., StatelessExecutor LLama.Examples/Examples/SemanticKernelChat.cs20-23 or InteractiveExecutor LLama.SemanticKernel/README.md25).
  2. Construct the SK service — pass the executor to LLamaSharpChatCompletion LLama.Examples/Examples/SemanticKernelChat.cs22 or LLamaSharpTextCompletion LLama.Examples/Examples/SemanticKernelPrompt.cs26
  3. Register with the kernel — use builder.Services.AddKeyedSingleton<ITextGenerationService>(...) LLama.Examples/Examples/SemanticKernelPrompt.cs25-26
  4. For embeddings — construct a LLamaEmbedder and wrap it in LLamaSharpEmbeddingGeneration LLama.SemanticKernel/README.md31-35

Sources: LLama.Examples/Examples/SemanticKernelChat.cs18-22 LLama.Examples/Examples/SemanticKernelPrompt.cs21-28 LLama.SemanticKernel/README.md11-38


Summary of Types

TypeFileSK InterfaceLLamaSharp Dependency
LLamaSharpChatCompletionChatCompletion/LLamaSharpChatCompletion.csIChatCompletionServiceILLamaExecutor
LLamaSharpTextCompletionTextCompletion/LLamaSharpTextCompletion.csITextGenerationServiceILLamaExecutor
LLamaSharpEmbeddingGenerationTextEmbedding/LLamaSharpEmbeddingGeneration.csITextEmbeddingGenerationServiceLLamaEmbedder
LLamaSharpPromptExecutionSettingsLLamaSharpPromptExecutionSettings.csextends PromptExecutionSettings
HistoryTransformChatCompletion/HistoryTransform.csDefaultHistoryTransform
ExtensionMethodsExtensionMethods.csChatHistory, InferenceParams
ChatRequestSettingsChatCompletion/ChatRequestSettings.csDeprecated

Sources: LLama.SemanticKernel/ChatCompletion/LLamaSharpChatCompletion.cs15 LLama.SemanticKernel/TextCompletion/LLamaSharpTextCompletion.cs9 LLama.SemanticKernel/TextEmbedding/LLamaSharpEmbeddingGeneration.cs7 LLama.SemanticKernel/LLamaSharpPromptExecutionSettings.cs7 LLama.SemanticKernel/ChatCompletion/HistoryTransform.cs9 LLama.SemanticKernel/ExtensionMethods.cs7 LLama.SemanticKernel/ChatCompletion/ChatRequestSettings.cs8