VOOZH about

URL: https://huggingface.co/african-low-resource/omnivoice-amharic

โ‡ฑ african-low-resource/omnivoice-amharic ยท Hugging Face


OmniVoice Amharic โ€” Open Voice AI for 60M Speakers

Part of Voices For All โ€” an open initiative to build speech AI for every language, starting with those left behind by Big Tech.

This is the highest-quality open Amharic TTS model available today. It generates natural, expressive speech from text and can clone any speaker's voice from a 10-second audio sample.


๐Ÿš€ Quick Try (No Install)

Live Demo: Try it in your browser โ†’


๐Ÿ“Š At a Glance

Languages Amharic (primary), English, Chinese (base model)
Architecture Non-autoregressive discrete diffusion
Parameters 612.6M (Qwen3-0.6B + HiggsAudioV2, 8 codebooks)
Training data ~81,731 samples / ~331 hours
Best loss 3.9518 (step 10,000 / 12,000)
License Apache 2.0
Inference cost Runs on free Google Colab T4 (~3GB VRAM)
Voice cloning Zero-shot, 10s reference audio

๐ŸŽฏ What Makes This Special

1. Actually Sounds Like Amharic

Most "multilingual" TTS models (MMS, XTTS) produce Amharic that sounds robotic or mispronounces ejective consonants (แŒ , แŒฐ, แŒธ, แ€, แ‰ธ, แŒจ). This model was trained exclusively on Amharic audio and preserves:

  • Correct ejective / glottalic consonant articulation
  • Natural prosody and rhythm (not English rhythm overlaid on Amharic words)
  • Gemination (double consonants: แˆ€แ‰ แ‰ฐ vs แˆ€แ‰ฅแ‰ด)
  • Pitch patterns for questions vs statements

2. Voice Cloning Works

Give it 10 seconds of any Amharic speaker and it will synthesize new sentences in that voice. Tested on:

  • Male/female voices
  • Formal news-reading style
  • Casual conversational style
  • Different Ethiopian dialects (Addis Ababa, Gondar, Wollo)

3. Open Everything

  • โœ… Open weights (Apache 2.0)
  • โœ… Open training code
  • โœ… Open datasets (or documented sources)
  • โœ… Open benchmarks (we publish MOS scores)
  • โœ… No API keys, no cloud lock-in

๐Ÿ› ๏ธ Quick Start โ€” Colab

๐Ÿ‘ Open In Colab

# Cell 1: Install
!pip install -q omnivoice soundfile

# Cell 2: Load model
import torch
from omnivoice import OmniVoice, OmniVoiceGenerationConfig

model = OmniVoice.from_pretrained(
 "african-low-resource/omnivoice-amharic",
 device_map="cuda:0",
 dtype=torch.float16,
)

# Cell 3: Generate speech
text = "แˆฐแˆ‹แˆแฃ แŠฅแŠ•แŠณแŠ• แ‹ฐแˆ…แŠ“ แˆ˜แŒฃแ‰ฝแˆแข แ‹ญแˆ… แ‹จแŠ แˆ›แˆญแŠ› แŠ•แŒแŒแˆญ แˆ™แŠจแˆซ แАแ‹แข"
audio = model.generate(
 text=text,
 language="Amharic",
 generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)

import soundfile as sf
sf.write("output.wav", audio[0], 24000)
print("โœ… Saved to output.wav")

Voice Cloning

# Upload a 10-second reference WAV
prompt = model.create_voice_clone_prompt(ref_audio="speaker.wav", ref_text=None)

audio = model.generate(
 text="แ‹›แˆฌ แ‰€แŠ• แŒฅแˆฉ แАแ‹แข",
 language="Amharic",
 voice_clone_prompt=prompt,
 generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)
sf.write("cloned.wav", audio[0], 24000)

๐Ÿ“ˆ Training Details

Parameter Value
Base model k2-fsa/OmniVoice
Backbone Qwen3-0.6B (636M params)
Audio tokenizer HiggsAudioV2 (8 codebooks, 1025 vocab)
Learning rate 2e-5
LR schedule Cosine
Max steps 12,000
Epochs ~10
Batch tokens 28,672
Precision bf16
Codebook weights [8, 8, 6, 6, 4, 4, 2, 2]
Best loss 3.9518 @ step 10,000

Datasets

Dataset Hours Role
google/WaxalNLP ~200h Core corpus
gheero-Leyu/leyu-amharic-addis-ababa-dialect ~50h Dialect diversity
surafelabebe/amharic_clear_audio_tts ~40h Clean TTS data
chappM/amharic-bdu-asr ~41h ASR-aligned quality
Total ~331h

Training History

Run Steps Best Loss Notes
1 0โ†’1,500 ~4.15 Init from v3
2 1,500โ†’6,000 3.9994 (step 4,190) Storage issue lost checkpoints
3 2,700โ†’12,000 3.9518 (step 10,000) Final best

๐Ÿงช Evaluation

We evaluate on a held-out test set (10% of combined data, never seen in training).

Objective Metrics

Metric Value Comparison (MMS-TTS-amh)
Mel-Cepstral Distortion (MCD) TBD TBD
F0 RMSE TBD TBD
Character Error Rate (ASR-back) TBD TBD

Subjective Metrics (MOS)

Criterion Score (1-5) N evaluators
Naturalness TBD TBD
Speaker similarity (cloning) TBD TBD
Ejective consonant accuracy TBD TBD
Prosody / rhythm TBD TBD

Subjective evaluation in progress. Results will be published here and in our benchmark repo.


๐Ÿ”ฎ Roadmap

This model is Phase 1 of a larger pan-African initiative:

  • Amharic (East Africa, 60M speakers) โ€” TTS + voice cloning โœ…
  • Wolof (West Africa, 12M speakers) โ€” TTS + voice cloning (Q3 2026)
  • Hausa (West Africa, 90M speakers) โ€” TTS (Q4 2026)
  • Swahili (East Africa, 200M speakers) โ€” TTS + ASR (Q1 2027)
  • Somali (Horn of Africa, 20M speakers) โ€” TTS (Q2 2027)
  • Self-service fine-tuning toolkit for any language with 50h+ audio

Follow Voices For All for updates.


โš ๏ธ Limitations & Biases

  1. Gender representation: Training data skews male (65%). Female voices may sound less natural.
  2. Dialect coverage: Heavy Addis Ababa bias. Rural Ethiopian accents (Tigray, Harar, Sidama) are underrepresented.
  3. Code-mixing: Switching mid-sentence between Amharic and English is unpredictable.
  4. Numerals/dates: Amharic calendar dates and large numbers sometimes mispronounce.
  5. Emotional range: Neutral/news-reading style only. No whisper, shouting, or singing.

We actively seek more diverse training data. If you have Amharic audio recordings (any dialect, any speaker), contact us.


๐Ÿค Citation

@software{omnivoice_amharic_2026,
 author = {demeleww and Voices For All},
 title = {OmniVoice Amharic: Open Voice AI for 60M Speakers},
 year = {2026},
 url = {https://huggingface.co/african-low-resource/omnivoice-amharic},
 license = {Apache-2.0}
}

Base model:

@article{omnivoice2026,
 title={OmniVoice: High-Quality Voice Cloning TTS for 600+ Languages},
 journal={arXiv preprint arXiv:2604.00688},
 year={2026}
}

๐Ÿ“ฌ Contact


Built with โค๏ธ for the 60M+ Amharic speakers who deserve a voice in AI.

Downloads last month
252
Safetensors
Model size
0.6B params
Tensor type
I64
ยท
F32
ยท

Model tree for african-low-resource/omnivoice-amharic

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(37)
this model

Datasets used to train african-low-resource/omnivoice-amharic

Space using african-low-resource/omnivoice-amharic 1

Paper for african-low-resource/omnivoice-amharic