![]() |
VOOZH | about |
dotnet add package SharpAI --version 1.0.17
NuGet\Install-Package SharpAI -Version 1.0.17
<PackageReference Include="SharpAI" Version="1.0.17" />
<PackageVersion Include="SharpAI" Version="1.0.17" />Directory.Packages.props
<PackageReference Include="SharpAI" />Project file
paket add SharpAI --version 1.0.17
#r "nuget: SharpAI, 1.0.17"
#:package SharpAI@1.0.17
#addin nuget:?package=SharpAI&version=1.0.17Install as a Cake Addin
#tool nuget:?package=SharpAI&version=1.0.17Install as a Cake Tool
<div align="center"> <img src="https://github.com/jchristn/sharpai/blob/main/assets/logo.png" width="256" height="256"> </div>
Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible and OpenAI-compatible API server. No cloud dependencies, no limits, just local embeddings and inference.
<p align="center"> <img src="https://img.shields.io/badge/.NET-5C2D91?style=for-the-badge&logo=.net&logoColor=white" /> <img src="https://img.shields.io/badge/C%23-239120?style=for-the-badge&logo=c-sharp&logoColor=white" /> <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" /> </p>
<p align="center"> <a href="https://www.nuget.org/packages/SharpAI/"> <img src="https://img.shields.io/nuget/v/SharpAI.svg?style=flat" alt="NuGet Version"> </a> <a href="https://www.nuget.org/packages/SharpAI"> <img src="https://img.shields.io/nuget/dt/SharpAI.svg" alt="NuGet Downloads"> </a> </p>
<p align="center"> <strong>A .NET library for local AI model inference with Ollama-compatible and OpenAI-compatible REST APIs</strong> </p>
<p align="center"> Embeddings • Completions • Chat • Built on LlamaSharp • GGUF Models Only </p>
SharpAI is organized as a monorepo containing the core library, server, dashboard, and client SDKs:
SharpAI/
├── src/ # Core .NET library and server
│ ├── SharpAI/ # Core library (NuGet: SharpAI)
│ ├── SharpAI.Server/ # REST API server (Watson 7 + OpenAPI/Swagger)
│ └── Test.*/ # Test projects
├── dashboard/ # Vite + React + Ant Design web interface
├── sdk/
│ ├── csharp/ # C# SDK (NuGet: SharpAI.Sdk)
│ ├── python/ # Python SDK (coming soon)
│ └── js/ # TypeScript/JavaScript SDK (npm: @sharpai/sdk)
├── docker/ # Docker assets
└── README.md
| Project | Description | Documentation |
|---|---|---|
| SharpAI | Core .NET library for local AI inference | This README |
| SharpAI.Server | Ollama & OpenAI compatible REST API server on Watson 7 with built-in OpenAPI/Swagger | This README |
| Dashboard | Vite + React web interface for managing models, running inference, and editing settings | |
| C# SDK | SDK for .NET applications to connect to SharpAI server | |
| TypeScript SDK | SDK for Node.js/browser applications | |
| Python SDK | SDK for Python applications |
/openapi.json and a live /swagger UI at startupGET /api/settings returns the live in-memory configuration, PUT /api/settings replaces it and rewrites sharpai.json on disk (preserving CreatedUtc and SoftwareVersion)general.architecture and general.pooling_type GGUF metadata determines whether it supports embeddings, completions, or both, and drives the correct chat template selectionInstall SharpAI via NuGet:
dotnet add package SharpAI
Or via Package Manager Console:
Install-Package SharpAI
The main entry point that provides access to all functionality:
using SharpAI;
using SyslogLogging;
// Initialize the AI driver
var ai = new AIDriver(
logging: new LoggingModule(),
databaseFilename: "./sharpai.db",
huggingFaceApiKey: "hf_xxxxxxxxxxxx",
modelDirectory: "./models/"
);
// Download a model from HuggingFace (GGUF format only)
await ai.Models.Add(
name: "QuantFactory/Qwen2.5-3B-GGUF",
quantizationPriority: null,
progressCallback: (url, bytesDownloaded, percentComplete) =>
{
Console.WriteLine($"Progress: {percentComplete:P0}");
});
// Generate a completion
string response = await ai.Completion.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "Once upon a time",
maxTokens: 512,
temperature: 0.7f
);
The AIDriver provides access to APIs via:
ai.Models - Model management operationsai.Embeddings - Embedding generationai.Completion - Text completion generationai.Chat - Chat completion generationManages model downloads and lifecycle:
// List all downloaded models
List<ModelFile> models = ai.Models.All();
// Get a specific model
ModelFile model = ai.Models.GetByName("QuantFactory/Qwen2.5-3B-GGUF");
// Download a new model from HuggingFace (GGUF format only)
ModelFile downloaded = await ai.Models.Add(
name: "leliuga/all-MiniLM-L6-v2-GGUF",
quantizationPriority: null,
progressCallback: null);
// Delete a model
ai.Models.Delete("QuantFactory/Qwen2.5-3B-GGUF");
// Get the filesystem path for a model
string modelPath = ai.Models.GetFilename("QuantFactory/Qwen2.5-3B-GGUF");
SharpAI automatically handles downloading GGUF files from HuggingFace. Only GGUF format models are supported.
Model metadata includes:
Generate vector embeddings for text:
// Single text embedding
float[] embedding = await ai.Embeddings.Generate(
model: "leliuga/all-MiniLM-L6-v2-GGUF",
input: "This is a sample text"
);
// Multiple text embeddings
string[] texts = { "First text", "Second text", "Third text" };
float[][] embeddings = await ai.Embeddings.Generate(
model: "leliuga/all-MiniLM-L6-v2-GGUF",
inputs: texts
);
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate text continuations:
// Non-streaming completion
string completion = await ai.Completion.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "The meaning of life is",
maxTokens: 512,
temperature: 0.7f
);
// Streaming completion
await foreach (string token in ai.Completion.GenerateCompletionStreaming(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "Write a poem about",
maxTokens: 512,
temperature: 0.8f))
{
Console.Write(token);
}
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate conversational responses:
// Non-streaming chat
string response = await ai.Chat.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: chatFormattedPrompt, // Prompt should be formatted for chat
maxTokens: 512,
temperature: 0.7f
);
// Streaming chat
await foreach (string token in ai.Chat.GenerateCompletionStreaming(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: chatFormattedPrompt,
maxTokens: 512,
temperature: 0.7f))
{
Console.Write(token);
}
SharpAI includes prompt builders to format conversations for different model types:
using SharpAI.Prompts;
var messages = new List<ChatMessage>
{
new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
new ChatMessage { Role = "user", Content = "What is the capital of France?" },
new ChatMessage { Role = "assistant", Content = "The capital of France is Paris." },
new ChatMessage { Role = "user", Content = "What is its population?" }
};
// Format for different model types
string chatMLPrompt = PromptBuilder.Build(ChatFormat.ChatML, messages);
/* Output:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>assistant
*/
string llama2Prompt = PromptBuilder.Build(ChatFormat.Llama2, messages);
/* Output:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>
What is the capital of France? [/INST] The capital of France is Paris. </s><s>[INST] What is its population? [/INST]
*/
string simplePrompt = PromptBuilder.Build(ChatFormat.Simple, messages);
/* Output:
system: You are a helpful assistant.
user: What is the capital of France?
assistant: The capital of France is Paris.
user: What is its population?
assistant:
*/
Supported chat formats:
Simple - Basic role: content format (generic models, base models)ChatML - OpenAI ChatML format (GPT models, models fine-tuned with ChatML) including Qwen 2, Qwen 3, and Qwen 3.5Llama2 - Llama 2 instruction format (Llama-2-Chat models)Llama3 - Llama 3 format (Llama-3-Instruct models)Alpaca - Alpaca instruction format (Alpaca, Vicuna, WizardLM, and many Llama-based fine-tunes)Mistral - Mistral instruction format (Mistral-Instruct, Mixtral-Instruct models)HumanAssistant - Human/Assistant format (Anthropic Claude-style training, some chat models)Zephyr - Zephyr model format (Zephyr beta/alpha models)Phi - Microsoft Phi format (Phi-2, Phi-3 models)DeepSeek - DeepSeek format (DeepSeek-Coder, DeepSeek-LLM models)Gemma - Google Gemma turn-token format (Gemma 2, Gemma 3, and Gemma 4 models)If you are unsure which your model supports, choose Simple.
SharpAI maps common architecture aliases automatically, including qwen3.5, qwen-3.5, gemma4, and gemma-4.
These mappings align with the LLamaSharp 0.27.0 upgrade and help newer Qwen3.5 and Gemma4 GGUFs pick the correct prompt template automatically.
using SharpAI.Prompts;
// Simple instruction
string instructionPrompt = TextPromptBuilder.Build(
TextGenerationFormat.Instruction,
"Write a haiku about programming"
);
/* Output:
### Instruction:
Write a haiku about programming
### Response:
*/
// Code generation with context
var context = new Dictionary<string, string>
{
["language"] = "python",
["requirements"] = "Include error handling"
};
string codePrompt = TextPromptBuilder.Build(
TextGenerationFormat.CodeGeneration,
"Write a function to parse JSON",
context
);
/* Output:
Language: python
Task: Write a function to parse JSON
Requirements: Include error handling
```python
*/
// Question-answer format
string qaPrompt = TextPromptBuilder.Build(
TextGenerationFormat.QuestionAnswer,
"What causes rain?"
);
/* Output:
Question: What causes rain?
Answer:
*/
// Few-shot examples
var examples = new List<(string input, string output)>
{
("2+2", "4"),
("5*3", "15")
};
string fewShotPrompt = TextPromptBuilder.BuildWithExamples(
TextGenerationFormat.QuestionAnswer,
"7-3",
examples
);
/* Output:
Examples:
Question: 2+2
Answer:
4
---
Question: 5*3
Answer:
15
---
Now complete the following:
Question: 7-3
Answer:
*/
Supported text generation formats:
Raw - No formattingCompletion - Continuation formatInstruction - Instruction/response formatQuestionAnswer - Q&A formatCreativeWriting - Story/creative formatCodeGeneration - Code generation formatAcademic - Academic writing formatListGeneration - List creation formatTemplateFilling - Template completionDialogue - Dialogue generationSharpAI includes a fully-functional REST API server through the SharpAI.Server project, built on Watson 7. It provides Ollama-compatible endpoints, OpenAI-compatible endpoints, a settings-management API, and built-in OpenAPI 3.0 / Swagger UI.
Ollama API endpoints include:
GET /api/tags — List available local models (returns a capabilities object indicating embedding and completion support per model)POST /api/pull — Download models from HuggingFace (streams NDJSON progress with downloaded, completed, total, and percent)DELETE /api/delete — Delete a local modelGET /api/ps — List models currently loaded in memory (analogous to ollama ps)POST /api/embed — Generate embeddingsPOST /api/generate — Text completions (streaming and non-streaming; honors options.stop)POST /api/chat — Chat completions (automatically wraps messages in the correct chat template for the model's GGUF architecture)OpenAI API endpoints include:
POST /v1/embeddings — Generate embeddingsPOST /v1/completions — Text completions (streaming via SSE)POST /v1/chat/completions — Chat completions (streaming via SSE)Settings API:
GET /api/settings — Return the full live in-memory Settings objectPUT /api/settings — Replace the in-memory settings and rewrite sharpai.json to disk. CreatedUtc and SoftwareVersion are preserved server-side so clients cannot overwrite them. Some settings (REST Hostname/Port/Ssl, Database) take effect only on the next restart.Operational endpoints:
GET /health - Lightweight liveness check for process monitoringGET /ready - Readiness check for native backend initialization, database initialization, and writable runtime directoriesAPI documentation:
GET /openapi.json — Complete OpenAPI 3.0 document describing every route, tag, request body, and response schemaGET /swagger — Interactive Swagger UI served from the same serverCORS preflight OPTIONS requests are handled by the server so dashboard cross-origin calls work out of the box.
Minimum:
For GPU Acceleration (Optional):
NVIDIA CUDA (Windows/Linux):
Apple Metal (macOS Apple Silicon):
Important GPU Notes:
SharpAI has been tested on:
For detailed installation instructions, troubleshooting, and production deployment, see .
When models are downloaded, the following information is tracked:
Models are stored in the specified modelDirectory with files named by their GUID. Model metadata is stored in the SQLite database specified by databaseFilename.
SharpAI automatically detects GPU availability and optimizes layer allocation at startup.
| Platform | CPU | GPU |
|---|---|---|
| Windows x64 | ✅ | ✅ (CUDA) |
| Linux x64 | ✅ | ✅ (CUDA) |
| macOS Apple Silicon (ARM64) | ✅ | ✅ (Metal) |
| macOS Intel (x64) | ✅ | ❌ |
| Docker on Apple Silicon | ✅ | ❌ (Metal requires bare-metal macOS) |
Supported:
Not Supported:
The NativeLibraryBootstrapper automatically detects your platform and GPU at startup, selecting the appropriate backend (CPU, CUDA, or Metal). See the Requirements section for detailed GPU requirements.
SharpAI.Server is available as a Docker image, providing an easy way to deploy the Ollama-compatible API server without local installation.
For Windows:
run.bat v4.0.1
For Linux/macOS:
./run.sh v4.0.1
For Windows:
compose-up.bat
For Linux/macOS:
./compose-up.sh
Before running the Docker container, decide what you want to persist:
/app/sharpai.json default. Mount your own sharpai.json when you want persistent/custom settings./app/logs, /app/models, and /app/temp. Bind mount ./logs/ and ./models/ when you want logs and downloaded GGUF models to survive container replacement.The official Docker image is available at: jchristn77/sharpai. Refer to the docker directory for assets useful for running in Docker and Docker Compose.
The Docker image contains CPU and CUDA-capable Linux native libraries and selects the backend at container startup/runtime. These environment variables are available in the image and are included with placeholder defaults in the compose files under docker/.
| Variable | Default | Description |
|---|---|---|
DOTNET_GC_SERVER |
1 |
Enables .NET server GC for sustained server workloads. The Docker entrypoint maps this to .NET's canonical DOTNET_gcServer setting. |
SHARPAI_FORCE_BACKEND |
auto |
Backend selection: auto, cpu, cuda, or metal. In Docker, metal cannot be used because containers run Linux. |
SHARPAI_CPU_VARIANT |
auto |
CPU native library variant: auto, avx512, avx2, avx, or noavx. |
SHARPAI_REQUIRE_BACKEND |
false |
When true, startup fails if the selected backend cannot load instead of falling back to CPU. |
SHARPAI_ENABLE_NATIVE_LOGGING |
false |
Enables llama.cpp native logging for backend troubleshooting. |
SHARPAI_NUM_THREADS |
0 |
Generation thread count. 0 means automatic sizing from the container CPU allocation. |
SHARPAI_BATCH_THREADS |
0 |
Batch evaluation thread count. 0 means use the generation thread count. |
SHARPAI_GPU_LAYERS |
auto |
GPU offload layers for CUDA/Metal: auto or -1 means all layers, 0 disables offload, positive values offload that many layers. |
SHARPAI_MAIN_GPU |
0 |
Main GPU index used by llama.cpp when multiple GPUs are visible. |
SHARPAI_CONTEXT_SIZE |
0 |
Context size override. 0 keeps model/library defaults. |
SHARPAI_BATCH_SIZE |
0 |
Prompt batch size override. 0 keeps library defaults. |
SHARPAI_UBATCH_SIZE |
0 |
Physical micro-batch size override. 0 keeps library defaults. |
SHARPAI_USE_MMAP |
true |
Enables memory-mapped model loading for faster loads and lower duplicate memory pressure. |
SHARPAI_USE_MLOCK |
false |
Locks model pages in RAM. If set to true, configure container memlock ulimits. |
SHARPAI_FLASH_ATTENTION |
false |
Enables flash attention when supported by the selected backend/model. Leave off unless validated with your models. |
For NVIDIA Docker deployments, the CUDA compose file also sets NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=compute,utility.
The container uses several volume mappings for persistence:
| Host Path | Container Path | Description |
|---|---|---|
./sharpai.json |
/app/sharpai.json |
Configuration file |
./sharpai.db |
/app/sharpai.db |
SQLite database for model registry |
./logs/ |
/app/logs/ |
Application logs |
./models/ |
/app/models/ |
Downloaded GGUF model files |
Modify the sharpai.json file to supply your configuration.
The container exposes port 8000 by default.
You can access Ollama APIs at:
http://localhost:8000/api/tags - List available modelshttp://localhost:8000/api/pull - Pull a modelhttp://localhost:8000/api/generate - Generate texthttp://localhost:8000/api/chat - Chat completionshttp://localhost:8000/api/embed - Generate embeddingsYou can access OpenAI APIs at:
http://localhost:8000/v1/embeddings - Generate embeddingshttp://localhost:8000/v1/completions - Generate texthttp://localhost:8000/v1/chat/completions - Chat completionsOperational endpoints:
http://localhost:8000/health - Liveness checkhttp://localhost:8000/ready - Readiness checkCreate persistent directories when you want host-side logs and models:
mkdir logs models
Create or mount sharpai.json when you need custom settings. The image includes a default for quick smoke tests.
Run the container:
# Windows
run.bat v4.0.1
# Linux/macOS
./run.sh v4.0.1
Download a model using the API (GGUF format required):
curl http://localhost:8000/api/pull \
-d '{"model":"QuantFactory/Qwen2.5-3B-GGUF"}'
Generate text:
curl http://localhost:8000/api/generate \
-d '{
"model": "QuantFactory/Qwen2.5-3B-GGUF",
"prompt": "Why is the sky blue?",
"stream": false
}'
For production deployments, you can use Docker Compose. Create a compose.yaml file:
services:
sharpai:
image: jchristn77/sharpai:v4.0.1
ports:
- "8000:8000"
volumes:
- ./sharpai.json:/app/sharpai.json
- ./sharpai.db:/app/sharpai.db
- ./logs:/app/logs
- ./models:/app/models
environment:
DOTNET_GC_SERVER: "1"
SHARPAI_FORCE_BACKEND: "auto"
SHARPAI_CPU_VARIANT: "auto"
SHARPAI_REQUIRE_BACKEND: "false"
SHARPAI_ENABLE_NATIVE_LOGGING: "false"
SHARPAI_NUM_THREADS: "0"
SHARPAI_BATCH_THREADS: "0"
SHARPAI_GPU_LAYERS: "auto"
SHARPAI_MAIN_GPU: "0"
SHARPAI_CONTEXT_SIZE: "0"
SHARPAI_BATCH_SIZE: "0"
SHARPAI_UBATCH_SIZE: "0"
SHARPAI_USE_MMAP: "true"
SHARPAI_USE_MLOCK: "false"
SHARPAI_FLASH_ATTENTION: "false"
healthcheck:
test: ["CMD-SHELL", "curl --fail http://localhost:8000/ready || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: unless-stopped
Then run:
docker compose up -d
To enable GPU acceleration in Docker:
Install the NVIDIA Container Toolkit and modify your run command:
docker run --gpus all \
-p 8000:8000 \
-v ./sharpai.json:/app/sharpai.json \
-v ./sharpai.db:/app/sharpai.db \
-v ./logs:/app/logs \
-v ./models:/app/models \
jchristn77/sharpai:v4.0.1
For Docker Compose, add:
services:
sharpai:
# ... other configuration ...
environment:
SHARPAI_FORCE_BACKEND: "cuda"
SHARPAI_REQUIRE_BACKEND: "true"
NVIDIA_VISIBLE_DEVICES: "all"
NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
sharpai.json exists and is valid JSON./models/ directory has proper write permissionsPlease see the file for detailed version history and release notes.
Have a bug, feature request, or idea? Please file an issue on our GitHub repository. We welcome community input on our roadmap!
This project is licensed under the MIT License.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 1 NuGet packages that depend on SharpAI:
| Package | Downloads |
|---|---|
|
SharpAI.Sdk
C# SDK for SharpAI - Local AI inference with Ollama and OpenAI compatible APIs |
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.17 | 139 | 4/26/2026 |
| 1.0.14 | 1,839 | 10/10/2025 |
| 1.0.12 | 305 | 8/29/2025 |
| 1.0.11 | 273 | 8/28/2025 |
| 1.0.10 | 280 | 8/27/2025 |
| 1.0.9 | 229 | 8/20/2025 |
| 1.0.8 | 299 | 8/8/2025 |
| 1.0.7 | 154 | 8/1/2025 |
| 1.0.6 | 197 | 7/31/2025 |
| 1.0.5 | 206 | 7/31/2025 |
| 1.0.4 | 232 | 7/30/2025 |
| 1.0.3 | 234 | 7/27/2025 |
| 1.0.2 | 442 | 7/25/2025 |
| 1.0.1 | 528 | 7/25/2025 |
| 1.0.0 | 151 | 7/12/2025 |
Upgraded LLamaSharp to 0.27.0 with Qwen3.5 and Gemma4 support improvements.