VOOZH about

URL: https://huggingface.co/ZzWater/ViiTorVoice-NAR

โ‡ฑ ZzWater/ViiTorVoice-NAR ยท Hugging Face


ViiTorVoice-NAR Local Models

๐Ÿ‘ GitHub
๐Ÿ‘ Hugging Face Demo

This directory contains the local model files used by viitor-ai/viitor-voice-nar.

ViiTorVoice-NAR is a non-autoregressive speech generation model for voice cloning, local speech editing, and emotion / paralinguistic speech control. The files in this directory are split by function so each model component can be loaded independently.

Directory

local_models/
โ”œโ”€โ”€ aligner/
โ”‚ โ””โ”€โ”€ Qwen3-ForcedAligner-0.6B/
โ”œโ”€โ”€ assets/
โ”‚ โ””โ”€โ”€ dualcodec_silence_2s.pt
โ”œโ”€โ”€ dualcodec/
โ”‚ โ”œโ”€โ”€ dualcodec_ckpts/
โ”‚ โ””โ”€โ”€ w2v-bert-2.0/
โ””โ”€โ”€ llm/
 โ””โ”€โ”€ 0p6_emotion/

Model Components

Component Path Purpose
ViiTorVoice-NAR LLM llm/0p6_emotion/ Generates target speech tokens from text, prompt speech tokens, edit masks, duration conditions, and emotion or non-verbal tags.
DualCodec dualcodec/dualcodec_ckpts/ Converts waveform audio into discrete speech codebook tokens and decodes generated tokens back into waveform audio.
W2V-BERT 2.0 dualcodec/w2v-bert-2.0/ Extracts semantic speech features used by the DualCodec encoder.
Qwen3 Forced Aligner aligner/Qwen3-ForcedAligner-0.6B/ Aligns speech audio with text and provides timestamps for local speech editing.
Runtime Assets assets/ Stores small auxiliary files, such as precomputed silence tokens used during generation or padding.

Main Uses

  • Voice cloning: synthesize new speech from target text while preserving the speaker characteristics of prompt audio.
  • Local speech editing: replace only the changed region of an utterance while keeping the rest of the audio stable.
  • Emotion and paralinguistic control: condition generation with tags such as emotion labels or non-verbal vocal events.

Notes

  • Keep the directory structure unchanged unless the loading code is updated as well.
  • Model weights are large binary files and are usually stored outside normal git tracking.
  • Check the upstream project and each submodel for license and usage terms.
Downloads last month
53

Model tree for ZzWater/ViiTorVoice-NAR

Finetuned
Qwen/Qwen3-0.6B
Quantized
(336)
this model

Datasets used to train ZzWater/ViiTorVoice-NAR

Space using ZzWater/ViiTorVoice-NAR 1