![]() |
VOOZH | about |
dotnet add package ElBruno.VibeVoiceTTS --version 0.5.1
NuGet\Install-Package ElBruno.VibeVoiceTTS -Version 0.5.1
<PackageReference Include="ElBruno.VibeVoiceTTS" Version="0.5.1" />
<PackageVersion Include="ElBruno.VibeVoiceTTS" Version="0.5.1" />Directory.Packages.props
<PackageReference Include="ElBruno.VibeVoiceTTS" />Project file
paket add ElBruno.VibeVoiceTTS --version 0.5.1
#r "nuget: ElBruno.VibeVoiceTTS, 0.5.1"
#:package ElBruno.VibeVoiceTTS@0.5.1
#addin nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.1Install as a Cake Addin
#tool nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.1Install as a Cake Tool
๐ NuGet
๐ NuGet Downloads
๐ Build Status
๐ HuggingFace
๐ GitHub stars
๐ Twitter Follow
A .NET library for text-to-speech synthesis using Microsoft's VibeVoice-Realtime-0.5B โ native C# inference via ONNX Runtime, no Python required at runtime.
ElBruno.VibeVoiceTTS โ install and start generating speech in minutesIServiceCollection integrationdotnet add package ElBruno.VibeVoiceTTS
using ElBruno.VibeVoiceTTS;
using var tts = new VibeVoiceSynthesizer();
await tts.EnsureModelAvailableAsync(); // auto-downloads ~1.5 GB on first run
float[] audio = await tts.GenerateAudioAsync("Hello! Welcome to VibeVoiceTTS.", "Carter");
tts.SaveWav("output.wav", audio);
// Use the enum (recommended)
float[] carter = await tts.GenerateAudioAsync("Hello from Carter!", VibeVoicePreset.Carter);
float[] emma = await tts.GenerateAudioAsync("Hello from Emma!", VibeVoicePreset.Emma);
// Or use a string name โ both short and internal names work
float[] audio = await tts.GenerateAudioAsync("Hello!", "Carter");
float[] audio2 = await tts.GenerateAudioAsync("Hello!", "en-Carter_man"); // also works
// Voices currently downloaded on disk
string[] available = tts.GetAvailableVoices();
// โ ["Carter", "Emma"] (default download includes Carter and Emma)
// All supported voices (including those not yet downloaded)
string[] supported = tts.GetSupportedVoices();
// โ ["Carter", "Davis", "Emma", "Frank", "Grace", "Mike"]
// Detailed metadata for all supported voices
VoiceInfo[] details = tts.GetSupportedVoiceDetails();
foreach (var voice in details)
Console.WriteLine($"{voice.Name} ({voice.Gender}, {voice.Language})");
๐ก On-demand voice download: Only Carter and Emma are downloaded by default with
EnsureModelAvailableAsync(). Other voices (Davis, Frank, Grace, Mike) are automatically downloaded on first use when you callGenerateAudioAsync(). You can also pre-download a specific voice:await tts.EnsureVoiceAvailableAsync("Davis", progress);
var progress = new Progress<DownloadProgress>(p =>
{
if (p.Stage == DownloadStage.Downloading)
Console.Write($"\rโฌ๏ธ [{p.CurrentFile}] {p.PercentComplete:F0}%");
else
Console.WriteLine($"{p.Stage}: {p.Message}");
});
await tts.EnsureModelAvailableAsync(progress);
var options = new VibeVoiceOptions
{
ModelPath = @"D:\models\vibevoice", // Custom model location (default: OS cache)
DiffusionSteps = 20, // Quality vs speed tradeoff
CfgScale = 1.5f, // Classifier-free guidance scale
SampleRate = 24000, // Output sample rate
MaxTextLength = 500, // Max characters per request (0 disables the limit)
};
using var tts = new VibeVoiceSynthesizer(options);
| Option | Default | Description |
|---|---|---|
ModelPath |
OS cache* | Directory where ONNX models are stored and downloaded |
HuggingFaceRepo |
elbruno/VibeVoice-Realtime-0.5B-ONNX |
HuggingFace repo for model downloads |
DiffusionSteps |
20 |
Number of diffusion denoising steps |
CfgScale |
1.5 |
Classifier-free guidance scale |
SampleRate |
24000 |
Output audio sample rate (Hz) |
MaxTextLength |
500 |
Maximum characters accepted per request (0 disables validation) |
Seed |
42 |
Random seed for reproducible output |
ExecutionProvider |
Cpu |
ONNX Runtime execution provider (Cpu, DirectML, Cuda) |
GpuDeviceId |
0 |
GPU device index (used with DirectML or CUDA) |
*Default model cache: Windows: %LOCALAPPDATA%\ElBruno\VibeVoice\models ยท Linux/macOS: ~/.local/share/elbruno/vibevoice/models
Enable GPU acceleration by setting the execution provider and installing the corresponding NuGet package:
# For DirectML (any Windows GPU โ NVIDIA, AMD, Intel):
dotnet add package Microsoft.ML.OnnxRuntime.DirectML
# For CUDA (NVIDIA only โ Windows and Linux):
dotnet add package Microsoft.ML.OnnxRuntime.Gpu
// DirectML โ recommended for Windows desktop apps
var options = new VibeVoiceOptions
{
ExecutionProvider = ExecutionProvider.DirectML,
GpuDeviceId = 0 // optional, selects which GPU
};
using var tts = new VibeVoiceSynthesizer(options);
// CUDA โ for NVIDIA GPUs with CUDA drivers
var options = new VibeVoiceOptions
{
ExecutionProvider = ExecutionProvider.Cuda,
GpuDeviceId = 0
};
using var tts = new VibeVoiceSynthesizer(options);
๐ก Note: If the selected GPU provider is unavailable (missing NuGet package or no compatible GPU), the library automatically falls back to CPU inference. When using DirectML, models with dynamic tensor shapes (LM models, acoustic decoder) run on CPU while fixed-shape models (prediction head, connector, EOS classifier) use GPU โ this works around known DirectML limitations with dynamic Reshape and ConvTranspose operations.
builder.Services.AddVibeVoice(options =>
{
options.DiffusionSteps = 20;
});
// Then inject IVibeVoiceSynthesizer in your services
๐ก Tip: For best results, keep sentences short (~10 words). Longer text may produce artifacts due to model limitations. Consider splitting long text into sentences. ๐ก Tip: The
MaxTextLengthoption defaults to 500 characters. Raise it for long passages, or set it to0to disable the guard entirely.
| Voice | Gender | Preset Enum | Internal Name |
|---|---|---|---|
| Carter | Male | VibeVoicePreset.Carter |
en-Carter_man |
| Davis | Male | VibeVoicePreset.Davis |
en-Davis_man |
| Emma | Female | VibeVoicePreset.Emma |
en-Emma_woman |
| Frank | Male | VibeVoicePreset.Frank |
en-Frank_man |
| Grace | Female | VibeVoicePreset.Grace |
en-Grace_woman |
| Mike | Male | VibeVoicePreset.Mike |
en-Mike_man |
All 6 voice presets are available on HuggingFace and are downloaded on-demand when first used.
โก Migration note: In versions prior to 0.2.0,
GetAvailableVoices()returned all 6 voices regardless of download status. Starting with 0.2.0, it returns only voices actually downloaded on disk. UseGetSupportedVoices()to see all 6 known presets. Voices are auto-downloaded on first use withGenerateAudioAsync(), or pre-download withEnsureVoiceAvailableAsync("Davis").
Language support: The model is primarily trained on English, with experimental multilingual capabilities (e.g., Spanish, French, German). Results may vary for non-English text.
๐ For full details on the model, supported languages, and voice characteristics, see the official VibeVoice documentation on HuggingFace and the VibeVoice GitHub repository.
For the complete API reference and advanced usage, see the .
This repository includes example projects demonstrating different ways to use VibeVoice:
| # | Status | Scenario | Stack | Level | Description |
|---|---|---|---|---|---|
| 1 | โ | Python | Beginner | Minimal TTS demo โ useful for model export and testing | |
| 2 | โ | Python + Blazor + Aspire | Intermediate | Web app with FastAPI backend and Blazor frontend | |
| 3 | โ | C# (.NET 8) | Beginner | Recommended starting point โ pure C# with ElBruno.VibeVoiceTTS |
|
| 4 | โ | C# + Blazor + Aspire | Intermediate | Full-stack C# app with WebAPI + Blazor frontend | |
| 5 | โ | Python | Intermediate | CLI to convert folders of .txt to .wav | |
| 6 | โ | Python | Intermediate | Chunked audio playback for low-latency | |
| 7 | โ | C# (.NET 10 MAUI) | Advanced | Cross-platform app with in-process ONNX TTS via ElBruno.VibeVoiceTTS NuGet package |
|
| 8 | โ | Python โ C# | Advanced | ONNX model export tools and pipeline docs |
Note: Python scenarios (1, 2, 5, 6) are primarily for ONNX model export, testing, and reference. The C# scenarios (3, 4) run entirely in .NET with no Python dependency. See the for details.
Pre-exported ONNX models are available on HuggingFace โ the C# library downloads them automatically:
๐ค elbruno/VibeVoice-Realtime-0.5B-ONNX
The model includes 9 ONNX files (autoregressive pipeline with KV-cache) and 6 voice presets. See for export details.
| Topic | Description |
|---|---|
| Prerequisites, setup, and first steps | |
| Detailed descriptions of all 8 scenarios | |
| System design, ONNX pipeline, and data flow | |
| Repository layout and file organization | |
| REST API documentation (for web scenarios) | |
| End-user guide for web interfaces | |
| NuGet publishing with GitHub Actions |
| Layer | Technology | Purpose |
|---|---|---|
| C# TTS Library | ElBruno.VibeVoiceTTS | Reusable .NET library with HuggingFace auto-download |
| TTS Model | VibeVoice-Realtime-0.5B | Microsoft's text-to-speech model |
| Inference | ONNX Runtime | Native C# model inference |
| Frontend | Blazor (.NET 10) | Interactive web UI |
| Orchestration | .NET Aspire | Service discovery & health checks |
git clone https://github.com/elbruno/ElBruno.VibeVoiceTTS.git
cd ElBruno.VibeVoiceTTS
dotnet build src/ElBruno.VibeVoiceTTS/ElBruno.VibeVoiceTTS.csproj
dotnet test src/ElBruno.VibeVoiceTTS.Tests/ElBruno.VibeVoiceTTS.Tests.csproj
Contributions are welcome! Please:
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License โ see the file for details.
Hi! I'm ElBruno ๐งก, a passionate developer and content creator exploring AI, .NET, and modern development practices.
Made with โค๏ธ by ElBruno
If you like this project, consider following my work across platforms:
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 1 NuGet packages that depend on ElBruno.VibeVoiceTTS:
| Package | Downloads |
|---|---|
|
ElBruno.VibeVoiceTTS.Realtime
Bridge between ElBruno.VibeVoiceTTS and ElBruno.Realtime โ provides ITextToSpeechClient adapter and DI extensions for VibeVoiceTTS integration with the real-time conversation pipeline. |
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.5.1 | 116 | 5/28/2026 |
| 0.5.0 | 134 | 4/30/2026 |
| 0.2.1-preview | 93 | 4/30/2026 |
| 0.2.0 | 134 | 4/10/2026 |
| 0.1.9 | 163 | 2/28/2026 |
| 0.1.8 | 191 | 2/27/2026 |
| 0.1.7-preview | 126 | 2/23/2026 |
| 0.1.6-preview | 119 | 2/22/2026 |
| 0.1.5-preview | 118 | 2/22/2026 |
| 0.1.4-preview | 118 | 2/22/2026 |
| 0.1.2-preview | 112 | 2/22/2026 |
| 0.1.1-preview | 119 | 2/22/2026 |
| 0.1.0-preview | 126 | 2/22/2026 |
| 0.0.1-preview | 117 | 2/22/2026 |