Last indexed: 18 May 2026 (ecd184)

Glossary

This page defines codebase-specific terms, jargon, and domain concepts used throughout LLamaSharp. It serves as a technical reference for onboarding engineers to bridge the gap between high-level LLM concepts and their specific implementations in the LLamaSharp repository.

Core System Entities

The following diagram illustrates the relationship between managed C# classes and their underlying native llama.cpp counterparts.

Entity Mapping: Managed to Native Space

Sources: LLama/LLamaWeights.cs11-20 LLama/LLamaContext.cs18-42 LLama/Native/SafeLLamaContextHandle.cs13-15 LLama/Native/SafeLlamaModelHandle.cs15-17

1. Model & Context Terms

Term	Definition	Code Pointer
Weights	The read-only parameters of the neural network loaded from a GGUF file. Represented by `LLamaWeights`.	LLama/LLamaWeights.cs11-13
Context	A stateful environment created from weights that holds the KV cache and allows for inference.	LLama/LLamaContext.cs18-20
KV Cache	Key-Value cache stored within the native context to avoid re-computing hidden states for previous tokens.	LLama/LLamaContext.cs220-230
GGUF	The binary file format used for distributing LLM models compatible with `llama.cpp`.	LLama/Native/SafeLlamaModelHandle.cs179-181
SafeHandle	A .NET mechanism for wrapping native pointers (`IntPtr`) to ensure deterministic resource cleanup and prevent leaks.	LLama/Native/SafeLLamaContextHandle.cs13-15
RoPE	Rotary Positional Embedding; the positional encoding type used by the model.	LLama/Native/SafeLlamaModelHandle.cs19-21

2. Execution & Inference Terms

Data Flow: Inference Request to Native Execution

Sources: LLama/LLamaContext.cs107-110 LLama/Native/SafeLLamaContextHandle.cs167-180 LLama.Examples/Examples/BatchedExecutorBoolQ.cs100

Term	Definition	Code Pointer
Executor	High-level abstraction that manages the interaction loop between user input and model output.	LLama/Abstractions/ILLamaExecutor.cs10-12
Sampling	The process of selecting the next token from the probability distribution (logits) produced by the model.	LLama/Sampling/ISamplingPipeline.cs8-10
StatelessExecutor	An executor that performs one-time inference without maintaining conversation history between calls.	LLama/LLamaStatelessExecutor.cs19-21
StatefulExecutor	Base class for executors (Interactive, Instruct) that maintain and persist session state and KV cache.	LLama/LLamaExecutorBase.cs20-21
Antiprompt	A string sequence that, when detected in the model output, triggers a stop in generation.	LLama/LLamaExecutorBase.cs70

3. Native Interop Jargon

P/Invoke: Platform Invocation Services, used by NativeApi to call functions in libllama.so or llama.dll. LLama/Native/NativeApi.cs33-37
Backends: Compiled native binaries tailored for specific hardware (CPU, CUDA, Vulkan, Metal). README.md87-91
Logits: The raw vector of scores produced by the model for every possible token in the vocabulary before normalization. LLama/Native/SafeLLamaContextHandle.cs230-235
Flash Attention: An optimization technique to speed up attention calculation and reduce memory usage. LLama/Common/ModelParams.cs104-105

Framework Integrations

LLamaSharp acts as a bridge for several .NET AI ecosystems:

Semantic Kernel: Integration allowing LLamaSharp to act as a chat or text generation service for Microsoft's orchestration SDK. LLama.SemanticKernel/LLamaSharp.SemanticKernel.csproj21-23
Kernel Memory: Integration for Retrieval Augmented Generation (RAG) using LLamaSharp for embeddings and generation. LLama.KernelMemory/LLamaSharp.KernelMemory.csproj1-28

Technical Abbreviations

BOS/EOS: Beginning of Sentence / End of Sentence tokens. LLama/LLamaContext.cs107-110
KV: Key-Value (referring to the attention mechanism cache). LLama/Common/ModelParams.cs126-127
SWA: Sliding Window Attention. LLama/Native/SafeLlamaModelHandle.cs64-66
MTMD: Multi-Task Multi-Domain, used in the context of multimodal weights and chunks (e.g., LLaVA/CLIP). LLama/Batched/Conversation.cs24-25
GBNF: GGML BNF, a grammar format used to constrain model output to specific structures like JSON. LLama/Sampling/DefaultSamplingPipeline.cs103-105

Sources:

Refresh this wiki

URL: https://deepwiki.com/SciSharp/LLamaSharp/10-glossary

⇱ Glossary | SciSharp/LLamaSharp | DeepWiki