VOOZH about

URL: https://huggingface.co/datasets/linyueqian/higgs-audio-v3-voice-bank

⇱ linyueqian/higgs-audio-v3-voice-bank · Datasets at Hugging Face


audio
audioduration (s)
3.88
6.68
ref_text
stringlengths
61
91
asr_text
stringlengths
61
91
wer
float64
0
0
duration_s
float64
3.88
6.68
pitch_hz
int64
83
223
pitch_band
stringclasses
3 values
The morning fog rolled across the harbor as the fishing boats prepared to set out.
The morning fog rolled across the harbor as the fishing boats prepared to set out.
0
4.28
98
low
She measured each ingredient carefully before folding the batter into the pan.
She measured each ingredient carefully before folding the batter into the pan.
0
5.12
117
low
Quantum computers exploit superposition to evaluate many possibilities at once.
Quantum computers exploit superposition to evaluate many possibilities at once.
0
6.68
170
high
After the storm finally passed, the children rushed outside to splash in the puddles.
After the storm finally passed, the children rushed outside to splash in the puddles.
0
5.44
187
high
The newest exhibit features artifacts recovered from a sunken merchant ship.
The newest exhibit features artifacts recovered from a sunken merchant ship.
0
4.24
156
mid
He tightened the last bolt, wiped his hands, and stepped back to admire the engine.
He tightened the last bolt, wiped his hands, and stepped back to admire the engine.
0
5
83
low
Migrating geese traced a wide arc across the pale autumn sky.
Migrating geese traced a wide arc across the pale autumn sky.
0
3.96
99
low
Our flight was delayed three hours because of unexpected weather over the mountains.
Our flight was delayed three hours because of unexpected weather over the mountains.
0
5.32
210
high
The professor explained how coral reefs support nearly a quarter of all marine species.
The professor explained how coral reefs support nearly a quarter of all marine species.
0
6.28
175
high
A gentle melody drifted from the open window of the corner cafe.
A gentle melody drifted from the open window of the corner cafe.
0
4.04
131
mid
Investors grew nervous as the currency slipped against the dollar for a third straight day.
Investors grew nervous as the currency slipped against the dollar for a third straight day.
0
5
180
high
The hikers reached the summit just as the sun broke through the clouds.
The hikers reached the summit just as the sun broke through the clouds.
0
4.4
223
high
Please remember to water the tomatoes and lock the gate before you leave.
Please remember to water the tomatoes and lock the gate before you leave.
0
3.92
182
high
The ancient manuscript had survived two fires and a flood over six centuries.
The ancient manuscript had survived two fires and a flood over six centuries.
0
5
106
low
Engineers tested the new bridge by sending many loaded trucks across it together.
Engineers tested the new bridge by sending many loaded trucks across it together.
0
4.72
123
low
My grandmother often told stories about the village where she grew up.
My grandmother often told stories about the village where she grew up.
0
4.32
188
high
The robot navigated the crowded warehouse, pausing whenever a worker crossed its path.
The robot navigated the crowded warehouse, pausing whenever a worker crossed its path.
0
6.08
174
high
Fresh snow blanketed the rooftops, muffling the usual noise of the city.
Fresh snow blanketed the rooftops, muffling the usual noise of the city.
0
4.64
98
low
The committee voted unanimously to fund the new community library.
The committee voted unanimously to fund the new community library.
0
4.16
191
high
She practiced the violin every evening until her fingers knew the melody by heart.
She practiced the violin every evening until her fingers knew the melody by heart.
0
5.08
173
high
Volcanic ash can disrupt air travel across an entire continent for weeks.
Volcanic ash can disrupt air travel across an entire continent for weeks.
0
4.92
115
low
The detective examined the muddy footprints leading away from the garden shed.
The detective examined the muddy footprints leading away from the garden shed.
0
4
170
high
We watched dolphins racing alongside the ferry as it crossed the bay.
We watched dolphins racing alongside the ferry as it crossed the bay.
0
3.88
143
mid
The startup doubled its revenue after launching the new subscription service.
The startup doubled its revenue after launching the new subscription service.
0
4.12
163
mid
A single candle flickered in the window of the old lighthouse cottage.
A single candle flickered in the window of the old lighthouse cottage.
0
4.24
184
high
The chef garnished each plate with a sprig of mint and a dusting of cocoa.
The chef garnished each plate with a sprig of mint and a dusting of cocoa.
0
5.04
152
mid
Heavy rain is expected throughout the weekend, so bring an umbrella to the game.
Heavy rain is expected throughout the weekend, so bring an umbrella to the game.
0
3.96
166
high
Astronomers discovered a faint galaxy that formed shortly after the universe began.
Astronomers discovered a faint galaxy that formed shortly after the universe began.
0
5.28
125
low
The puppy chased its tail in circles until it collapsed in a happy heap.
The puppy chased its tail in circles until it collapsed in a happy heap.
0
4.6
219
high
Negotiations continued late into the night before the two sides reached an agreement.
Negotiations continued late into the night before the two sides reached an agreement.
0
5.44
105
low

Higgs-Audio-V3 Voice Bank (30 synthetic reference voices)

30 reference voices for voice-clone TTS benchmarking, generated with bosonai/higgs-audio-v3-tts-4b in default-voice ("smart voice") mode — one clip per random seed (1001-1030), each speaking a distinct enrollment sentence. Use as a controlled, repeatable voice pool instead of per-row dataset reference audios.

Each audio ships with its corresponding reference text (ASR-verified)

Every clip is paired with the transcript of what it actually says. The reference text was verified by transcribing each clip with openai/whisper-small and comparing to the generation prompt: 28/30 clips WER = 0.000, mean WER = 0.5% (the two non-zero clips are number-word normalization, e.g. "forty" vs "40", not real errors). No clip needed its reference text replaced.

Layout

  • prompt-wavs/voice_XXXX.wav - 30 reference clips (24 kHz, ~6-8 s each)
  • metadata.csv - audiofolder metadata that drives the HF data viewer: file_name, ref_text, asr_text, wer, duration_s, pitch_hz, pitch_band
  • meta.lst - voice_id|ref_text (pipe-delimited; convenience for seed-tts-style loaders)

Voice diversity

Distinct sentence per voice; median pitch spans 83-223 Hz - 9 low (<130 Hz), 5 mid (130-165 Hz), 16 high (>165 Hz). pitch_band in voices.csv is a coarse voice tag.

Use for voice-clone eval (round-robin)

Pair each target text in your TTS eval set with one of these 30 voices cyclically. For each request feed prompt-wavs/voice_XXXX.wav as ref_audio and that voice's ref_text (from meta.lst) as ref_text.

Downloads last month
110