VOOZH about

URL: https://huggingface.co/biodatlab/ThonburianTTS

โ‡ฑ biodatlab/ThonburianTTS ยท Hugging Face


๐Ÿ‘ Image

๐Ÿ‘ Image

๐Ÿ”Š Model Checkpoints | ๐Ÿค— Gradio Demo | ๐Ÿ“„ ThonburianTTS Paper | Colab Notebook | GitHub

Thonburian TTS

Thonburian TTS is a Thai Text-to-Speech (TTS) engine built on top of the F5-TTS.
It generates natural and expressive Thai speech by leveraging Flow-Matching diffusion techniques and can mimic reference voices from short audio samples. The system supports:

  • Thai language generation (language="th")
  • Reference-based voice cloning using short audio clips
  • High-quality synthesis with controllable speed and silence trimming

Model Checkpoints

Model Component Description URL
F5-TTS Thai Flow Matching-based Thai TTS models Link
F5-TTS IPA Flow Matching-based Thai-IPA TTS models Link

Quick Usage

Installation

Install dependencies:

pip install torch cached-path librosa transformers f5-tts
sudo apt install ffmpeg

Clone GitHub

git clone https://github.com/biodatlab/thonburian-tts.git
cd thonburian-tts

Loading Thai Script based Models

from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch

# Configure F5-TTS model
model_config = ModelConfig(
 language="th",
 model_type="F5",
 checkpoint="hf://biodatlab/ThonburianTTS/megaF5/mega_f5_last.safetensors",
 vocab_file="hf://biodatlab/ThonburianTTS/megaF5/mega_vocab.txt",
 vocoder="vocos",
 device="cuda" if torch.cuda.is_available() else "cpu"
)

# Basic audio settings
audio_config = AudioConfig(
 silence_threshold=-45,
 cfg_strength=2.5,
 speed=1.0
)

pipeline = FlowTTSPipeline(model_config, audio_config)

Loading IPA based Models

from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch

# Configure F5-TTS model
model_config = ModelConfig(
 model_type="F5",
 checkpoint="hf://biodatlab/ThonburianTTS/megaIPA/model_last_prune.safetensors",
 vocab_file="hf://biodatlab/ThonburianTTS/megaIPA/mega_vocab_ipa.txt",
 vocoder="vocos",
 device="cuda" if torch.cuda.is_available() else "cpu"
)

# Basic audio settings
audio_config = AudioConfig(
 silence_threshold=-45,
 cfg_strength=2.5,
 speed=1.0
)

pipeline = FlowTTSPipeline(model_config, audio_config)

Example Outputs

๐Ÿ‘ Image

๐ŸŽต Sample 1 โ€“ Single-speaker Thai Normal Text
๐Ÿ‘ Image

๐ŸŽต Sample 2 โ€“ Single-Speaker Thai Code-mixed Text
๐Ÿ‘ Image

๐ŸŽต Sample 3 โ€“ Multi-Speaker Conversational Speech

Developers

๐Ÿ‘ Image

Citation

If you use ThonburianTTS in your research, please cite:

@INPROCEEDINGS{11320472,
 author={Aung, Thura and Sriwirote, Panyut and Thavornmongkol, Thanachot and Pipatsrisawat, Knot and Achakulvisut, Titipat and Aung, Zaw Htet},
 booktitle={2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)}, 
 title={ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech}, 
 year={2025},
 volume={},
 number={},
 pages={1-6},
 keywords={Adaptation models;Codes;Accuracy;Error analysis;Phonetics;Robustness;Natural language processing;Text to speech;Noise measurement;Research and development;Thai text-to-speech;Flow matching;F5-TTS},
 doi={10.1109/iSAI-NLP66160.2025.11320472}}
Thura Aung, Panyut Sriwirote, Thanachot Thavornmongkol, Knot Pipatsrisawat, Titipat Achakulvisut, Zaw Htet Aung, "ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech", 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Phuket, Thailand, 2025, pp. 1-6, doi: 10.1109/iSAI-NLP66160.2025.11320472.

License

The models are released under the Creative Commons Attribution Non-Commercial ShareAlike 4.0 License (CC BY-NC-SA 4.0).

Acknowledgement

We would like to acknowledge NSTDA Supercomputer Center (ThaiSC) project #pv824003 for providing computing resources for this work.

Downloads last month
2,695

Model tree for biodatlab/ThonburianTTS

Base model

SWivid/F5-TTS
Finetuned
(132)
this model

Dataset used to train biodatlab/ThonburianTTS

Space using biodatlab/ThonburianTTS 1