![]() |
VOOZH | about |
dotnet add package NeuralCodecs --version 0.4.0
NuGet\Install-Package NeuralCodecs -Version 0.4.0
<PackageReference Include="NeuralCodecs" Version="0.4.0" />
<PackageVersion Include="NeuralCodecs" Version="0.4.0" />Directory.Packages.props
<PackageReference Include="NeuralCodecs" />Project file
paket add NeuralCodecs --version 0.4.0
#r "nuget: NeuralCodecs, 0.4.0"
#:package NeuralCodecs@0.4.0
#addin nuget:?package=NeuralCodecs&version=0.4.0Install as a Cake Addin
#tool nuget:?package=NeuralCodecs&version=0.4.0Install as a Cake Tool
NeuralCodecs is a .NET library for neural audio codec implementations and TTS models written purely in C#. It includes implementations of SNAC, DAC, Encodec, and Dia, along with advanced audio processing tools.
Install the main package from NuGet:
dotnet add package NeuralCodecs
Or the Package Manager Console:
Install-Package NeuralCodecs
Models will be automatically downloaded given the huggingface user/model, or can be downloaded separately:
SNAC Models - Available from hubersiuzdak's HuggingFace
DAC Models - Available from Descript's HuggingFace
Encodec Models - Available from Meta's HuggingFace
Dia Model - Available from Nari Labs' HuggingFace
Here's a simple example to get you started:
using NeuralCodecs;
// Load a SNAC model
var model = await NeuralCodecs.CreateSNACAsync("path/to/model.pt");
// Process audio
float[] audioData = LoadAudioFile("input.wav");
var compressed = model.ProcessAudio(audioData, sampleRate: 24000);
// Save the result
SaveAudioFile("output.wav", compressed);
For more detailed examples, see the examples section below.
There are several ways to load a model:
// Load SNAC model with static method provided for built-in models
var model = await NeuralCodecs.CreateSNACAsync("model.pt");
var model = await NeuralCodecs.CreateSNACAsync(modelPath, SNACConfig.SNAC24Khz);
// Load model with default config from IModelLoader instance
var torchLoader = NeuralCodecs.CreateTorchLoader();
var model = await torchLoader.LoadModelAsync<SNAC, SNACConfig>("model.pt");
// For Encodec with custom bandwidth and settings
var encodecConfig = new EncodecConfig {
SampleRate = 48000,
Bandwidth = 12.0f,
Channels = 2, // Stereo audio
Normalize = true
};
var encodecModel = await torchLoader.LoadModelAsync<Encodec, EncodecConfig>("encodec_model.pt", encodecConfig);
// Load custom model with factory method
var model = await torchLoader.LoadModelAsync<CustomModel, CustomConfig>(
"model.pt",
config => new CustomModel(config, ...),
config);
Models can be loaded in Pytorch or Safetensors format.
The AudioTools namespace provides extensive audio processing capabilities:
var audio = new Tensor(...); // Load or create audio tensor
// Apply effects
var processedAudio = AudioEffects.ApplyCompressor(
audio,
sampleRate: 48000,
threshold: -20f,
ratio: 4.0f);
// Compute spectrograms and transforms
var spectrogram = DSP.MelSpectrogram(audio, sampleRate);
var stft = DSP.STFT(audio, windowSize: 1024, hopSize: 512, windowType: "hann");
There are two main ways to process audio:
// Compress audio in one step
var processedAudio = model.ProcessAudio(audioData, sampleRate);
// Encode audio to compressed format
var codes = model.Encode(buffer);
// Decode back to audio
var processedAudio = model.Decode(codes);
Saving the processed audio
Use your preferred method to save WAV files
// using NAudio
await using var writer = new WaveFileWriter(
outputPath,
new WaveFormat(model.Config.SamplingRate, channels: model.Channels)
);
writer.WriteSamples(processedAudio, 0, processedAudio.Length);
Encodec provides additional capabilities:
// Set target bandwidth for compression (supported values depend on model)
encodecModel.SetTargetBandwidth(12.0f); // 12 kbps
// Get available bandwidth options
var availableBandwidths = encodecModel.TargetBandwidths; // e.g. [1.5, 3, 6, 12, 24]
// Use language model for enhanced compression quality
var lm = await encodecModel.GetLanguageModel();
// Apply LM during encoding/decoding for better quality
// Direct file compression
await EncodecCompressor.CompressToFileAsync(encodecModel, audioTensor, "audio.ecdc", useLm: true);
// Decompress from file
var (decompressedAudio, sampleRate) = await EncodecCompressor.DecompressFromFileAsync("audio.ecdc");
Dia is a 1.6B parameter text-to-speech model that generates highly realistic dialogue directly from transcripts:
// Load Dia model with optional DAC codec
var diaConfig = new DiaConfig
{
LoadDACModel = true,
SampleRate = 44100
};
var diaModel = NeuralCodecs.CreateDiaAsync("model.pt", diaconfig)
// or use LoadDACModel = false in config and manually load DAC:
diaModel.LoadDacModel("dac_model.pt");
// Basic text-to-speech generation
var text = "[S1] Hello, how are you today? [S2] I'm doing great, thanks for asking!";
var audioOutput = diaModel.Generate(
text: text,
maxTokens: 1000,
cfgScale: 3.0f,
temperature: 1.2f,
topP: 0.95f);
// Voice cloning with audio prompt
var audioPromptPath = "reference_voice.wav";
var clonedAudio = diaModel.Generate(
text: "[S1] This is my cloned voice speaking new words.",
audioPromptPath: audioPromptPath,
maxTokens: 1000);
// Batch generation for multiple texts
var texts = new List<string>
{
"[S1] First dialogue line.",
"[S2] Second dialogue line with (laughs) non-verbal."
};
var batchResults = diaModel.Generate(texts, maxTokens: 800);
// Save generated audio
Dia.SaveAudio("output.wav", audioOutput);
Audio Speed Correction: Dia includes built-in speed correction to handle the automatic speed-up issue on longer inputs:
var diaConfig = new DiaConfig
{
LoadDACModel = true,
SampleRate = 44100,
// Configure speed correction method
SpeedCorrectionMethod = AudioSpeedCorrectionMethod.Hybrid, // Default: best quality
// Configure slowdown mode
SlowdownMode = AudioSlowdownMode.Dynamic // Default: adapts to text length
};
Speed Correction Examples:
// For highest quality output (default)
var highQualityConfig = new DiaConfig
{
SpeedCorrectionMethod = AudioSpeedCorrectionMethod.Hybrid,
SlowdownMode = AudioSlowdownMode.Dynamic
};
// For testing multiple correction methods
var testConfig = new DiaConfig
{
SpeedCorrectionMethod = AudioSpeedCorrectionMethod.All // Generates multiple output variants
};
// For no speed correction (fastest processing)
var fastConfig = new DiaConfig
{
SpeedCorrectionMethod = AudioSpeedCorrectionMethod.None
};
Memory Usage: Similar to the python implementation, ~10-11GB GPU memory is required for the Dia model with DAC codec.
Text Format Requirements:
[S1] speaker tag[S1] and [S2] for dialogue (repeating the same speaker tag consecutively may impact generation)Non-Verbal Communications: Dia supports various non-verbal tags. Some work more consistently than others (laughs, chuckles), but be prepared for occasional unexpected output from some tags (sneezes, applause, coughs ...)
var textWithNonVerbals = "[S1] I can't believe it! (gasps) [S2] That's amazing! (laughs)";
Supported non-verbals: (laughs), (clears throat), (sighs), (gasps), (coughs), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)
Voice Cloning Best Practices:
// Voice cloning example with transcript
var referenceTranscript = "[S1] This is the reference voice speaking clearly.";
var newText = "[S1] Now I will say something completely different.";
var clonedOutput = diaModel.Generate(
text: referenceTranscript + " " + newText,
audioPromptPath: "reference.wav");
Check out the Example project for a complete implementation, including:
The example includes tools for visualizing and comparing audio spectrograms:
Audio before and after compression with DAC Codec 24kHz
<img src="Docs/Images/spectrogram_DAC_24k.png" width="500" height="300">
Suggestions and contributions are welcome! Here's how you can help:
This project is licensed under the Apache-2.0 License, see the LICENSE file for more information.
This project uses libraries under several different licenses, see THIRD-PARTY-NOTICES for more information.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.