Last indexed: 18 May 2026 (ecd184)

Third-Party Integrations

This document describes how LLamaSharp integrates with third-party AI frameworks and libraries beyond Microsoft's Semantic Kernel and Kernel Memory. For Microsoft-specific integrations, see 7.1 Semantic Kernel Integration and 7.2 Kernel Memory Integration

LLamaSharp is designed as a foundational library that provides .NET bindings to llama.cpp, making it suitable for integration into higher-level AI frameworks and application platforms. The core library exposes well-defined abstractions that third-party frameworks can wrap or extend to provide LLamaSharp as a local LLM backend option. LLama/LLamaSharp.csproj19-23

Integration Architecture

LLamaSharp's architecture enables third-party integrations through several key design patterns:

Core Abstractions: The library provides interfaces like ILLamaExecutor that define contracts without imposing implementation details. LLama.Examples/Examples/SpeechChat.cs48
Executor Pattern: The executor abstraction allows frameworks to choose between stateful (InteractiveExecutor, InstructExecutor), stateless (StatelessExecutor), or batched execution (BatchedExecutor) based on their needs. LLama.Examples/Examples/SpeechChat.cs61 LLama.Examples/ExampleRunner.cs17-20 LLama.Examples/ExampleRunner.cs32-37
Standard Interfaces: Integration with Microsoft.Extensions.AI.Abstractions provides common AI abstractions recognized across the .NET ecosystem. LLama/LLamaSharp.csproj54
Resource Management: SafeHandle-based native resource management ensures proper cleanup regardless of how the library is integrated. LLama.Examples/Examples/SpeechChat.cs104-108

Natural Language to Code Entity Mapping: Architecture

The following diagram bridges the conceptual integration space with specific code entities in the LLamaSharp project.

Title: Framework Integration Architecture

Sources: LLama/LLamaSharp.csproj54-55 LLama.Examples/Examples/SpeechChat.cs40-65 LLama.Examples/ExampleRunner.cs17-20 LLama.Examples/ExampleRunner.cs32-37

Microsoft.Extensions.AI Integration

LLamaSharp integrates with Microsoft.Extensions.AI.Abstractions, providing a standardized interface layer that enables interoperability across the .NET AI ecosystem.

Dependencies

The core LLamaSharp project utilizes standard abstractions to ensure compatibility:

Package	Purpose
`Microsoft.Extensions.AI.Abstractions`	Common AI abstractions (text generation, embeddings) LLama/LLamaSharp.csproj54
`Microsoft.Extensions.Logging.Abstractions`	Logging infrastructure LLama/LLamaSharp.csproj55
`System.Numerics.Tensors`	Tensor operations for AI workflows LLama/LLamaSharp.csproj58

These abstractions define standard interfaces like IChatClient and IEmbeddingGenerator that LLamaSharp components can implement or wrap to achieve framework-agnostic integration.

Sources: LLama/LLamaSharp.csproj50-59

BotSharp Integration

BotSharp is an open-source machine learning framework designed for building AI bot platforms. It integrates LLamaSharp as a local LLM backend, allowing developers to create conversational AI applications without relying on cloud-based LLM services.

Integration Approach

BotSharp wraps LLamaSharp's executor abstractions to provide:

Conversational AI: Uses InteractiveExecutor for maintaining conversation state across multiple turns. LLama.Examples/Examples/SpeechChat.cs61
Model Management: Leverages LLamaWeights for efficient model loading and sharing across multiple bot instances. LLama.Examples/Examples/SpeechChat.cs59

Sources: LLama.Examples/Examples/SpeechChat.cs48-61

LangChain Integration

LangChain is a framework for developing applications powered by language models, focusing on composition, retrieval-augmented generation, and agent-based architectures. The LangChain.NET implementation includes LLamaSharp integration.

Integration Approach

LangChain wraps LLamaSharp to provide:

LLM Abstraction: Implements LangChain's ILanguageModel interface using LLamaSharp executors.
Chain Composition: Combines LLamaSharp with document loaders, vector stores, and other LangChain components.
Streaming Support: Utilizes LLamaSharp's IAsyncEnumerable token generation for real-time output. LLama.Examples/Examples/SpeechChat.cs84-85

Sources: LLama.Examples/Examples/SpeechChat.cs84-85

MaIN.NET Integration

MaIN.NET provides a simplified approach to orchestrating agents and chats from different LLM providers. It supports multiple backends, including LLamaSharp for local inference.

Integration Approach

MaIN.NET uses LLamaSharp as one of several interchangeable LLM providers:

Provider Abstraction: Wraps LLamaWeights and executor types behind a unified provider interface.
Multi-Provider Scenarios: Enables applications to switch between local (LLamaSharp) and cloud-based LLMs.

Speech Integration Examples

LLamaSharp can be integrated with speech processing libraries for applications like speech-to-text (STT) and text-to-speech (TTS). The LLama.Examples project includes the SpeechChat example demonstrating this. LLama.Examples/ExampleRunner.cs40

SpeechChat Example

The SpeechChat example demonstrates real-time audio transcription using Whisper.net and then feeding the transcribed text to a LLamaSharp InteractiveExecutor for conversational AI. LLama.Examples/Examples/SpeechChat.cs10-37

Implementation Details

The SpeechChat example uses the following key components:

SpeechRecognitionServer: Manages audio input from a microphone using NAudio.WaveInEvent LLama.Examples/Examples/SpeechChat.cs125-126 and processes it with WhisperProcessor from Whisper.net for speech-to-text transcription. LLama.Examples/Examples/SpeechChat.cs129
ISpeechListener: An interface defining methods for handling transcribed speech. LLama.Examples/Examples/SpeechChat.cs111-115
LlamaSessionSpeechListener: Implements ISpeechListener and integrates with LLamaSharp. LLama.Examples/Examples/SpeechChat.cs40-41
- It loads a language model using LLamaWeights.LoadFromFile and creates an InteractiveExecutor. LLama.Examples/Examples/SpeechChat.cs55-61
- When speech is detected and transcribed, HandleSpeech is called. If the LLM is not currently responding, the transcription is sent to the InteractiveExecutor via SendMessage. LLama.Examples/Examples/SpeechChat.cs73-74
- The InferAsync method of the InteractiveExecutor is used to generate responses. LLama.Examples/Examples/SpeechChat.cs84-85

Data Flow: Speech Recognition to Inference

Title: SpeechChat Data Flow

Sources: LLama.Examples/Examples/SpeechChat.cs30-65 LLama.Examples/Examples/SpeechChat.cs125-129 LLama.Examples/Examples/SpeechChat.cs111-115 LLama.Examples/Examples/SpeechChat.cs40-41 LLama.Examples/Examples/SpeechChat.cs55-61 LLama.Examples/Examples/SpeechChat.cs73-74 LLama.Examples/Examples/SpeechChat.cs84-85

Common Integration Patterns

Pattern 1: Parameter Mapping

Frameworks typically wrap LLamaSharp's parameter interfaces to provide their own configuration layers.

Extension Point	Interface/Class	Purpose
Model Configuration	`ModelParams`	GPU layers, model path, threading LLama.Examples/Examples/SpeechChat.cs55-58
Context Configuration	`LLamaContext`	Context lifecycle and inference execution LLama.Examples/Examples/SpeechChat.cs60
Batched Execution	`BatchedExecutor`	Concurrent conversation management LLama.Examples/ExampleRunner.cs32-37

Sources: LLama.Examples/Examples/SpeechChat.cs55-61 LLama.Examples/ExampleRunner.cs32-37

Pattern 2: Persistence and State

Frameworks can integrate LLamaSharp's session persistence to allow for long-running conversations.

LLama.Examples/ExampleRunner.cs33

Sources: LLama.Examples/ExampleRunner.cs33

Thread Safety and Concurrency Considerations

When integrating LLamaSharp into multi-threaded frameworks, developers must manage the lifecycle of native resources.

Component	Thread Safety	Notes
`LLamaWeights`	Thread-safe	Can be shared across multiple `LLamaContext` instances. LLama.Examples/Examples/SpeechChat.cs59-60
`LLamaContext`	Not thread-safe	Interacts directly with native library; stateful per instance. LLama.Examples/Examples/SpeechChat.cs60
`BatchedExecutor`	Managed	Handles multiple `Conversation` objects concurrently. LLama.Examples/ExampleRunner.cs32-37
`SafeHandles`	Thread-safe	Reference counting prevents premature disposal of native pointers. LLama.Examples/Examples/SpeechChat.cs104-109

Sources: LLama.Examples/Examples/SpeechChat.cs104-109 LLama.Examples/ExampleRunner.cs32-37

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/7.3-third-party-integrations