ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition

👁 Ranked #1: Speech Recognition on Common Voice Vi
👁 Ranked #1: Speech Recognition on VIVOS

👁 License: CC BY-NC 4.0
👁 GitHub
👁 Paper
👁 Model size

Citation

If you use this work in your research, please cite:

@INPROCEEDINGS{10888640,
 author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
 booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
 title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, 
 year={2025},
 volume={},
 number={},
 pages={1-5},
 keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
 doi={10.1109/ICASSP49660.2025.10888640}}
}

Contact

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Datasets used to train songlindotiot/chunkformer-large-vie

Paper for songlindotiot/chunkformer-large-vie

Paper • 2502.14673 • Published Feb 20, 2025 • 1

Evaluation results

Test WER on common-voice-vietnamese
Common Voice Vi Leaderboard
6.660
Test WER on VIVOS
Vivos Leaderboard
4.180
Test WER on VLSP - Task 1
self-reported
14.090

URL: https://huggingface.co/songlindotiot/chunkformer-large-vie

⇱ songlindotiot/chunkformer-large-vie · Hugging Face

ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition

Citation

Contact

Datasets used to train songlindotiot/chunkformer-large-vie

Paper for songlindotiot/chunkformer-large-vie

Evaluation results