Whisper Large-v3-Turbo (Galician Fine-Tuned)
This model is a fine-tuned version of openai/whisper-large-v3-turbo for automatic speech recognition (ASR) in Galician (gl).
It significantly improves transcription quality in Galician compared to the base model, especially on in-domain and low-resource speech data.
Training Data
The model was fine-tuned on a combined Galician ASR dataset built from multiple public and curated corpora.
All audio was normalised to 16 kHz and a homogeneous (audio, text) format.
Datasets Included
- Common Voice v23 (Galician)
- OpenSLR Speech Translation GL-EN (Galician side)
- FLEURS GL-EN (Galician side)
- FalAI (20% of validated split)
- Transcrispeech (Galician)
These datasets cover clean read speech, semi-spontaneous speech and more challenging acoustic conditions.
Dataset Statistics
- Train: 171,619 utterances
- Validation: 22,209 utterances
- Test: 21,542 utterances
Training Procedure
Fine-tuning was performed using the 🤗 Transformers Seq2SeqTrainer.
- Effective batch size: 16
- Learning rate: 1e-5
- Warmup steps: 800
- Max training steps: 53,630
- Precision: FP16
- Evaluation metric: Word Error Rate (WER)
- Model selection: Best checkpoint selected based on validation WER
Audio features were extracted using WhisperFeatureExtractor, and text was tokenized with WhisperTokenizer configured for Galician transcription.
Evaluation Results
Evaluation was performed on held-out test splits for each corpus and on a combined test set.
Metrics are reported as WER (Word Error Rate) and CER (Character Error Rate).
Fine-Tuned Model (No Text Normalization)
Per-corpus results
| Corpus | N | WER | CER |
|---|---|---|---|
| FalAI | 4776 | 0.0097 | 0.0034 |
| CommonVoice | 14563 | 0.0688 | 0.0153 |
| OpenSLR | 282 | 0.0808 | 0.0378 |
| FLEURS | 212 | 0.1980 | 0.0730 |
| Transcrispeech | 1710 | 0.2097 | 0.0770 |
Combined test set
| Dataset | N | WER | CER |
|---|---|---|---|
| TOTAL | 21543 | 0.0962 | 0.0296 |
Fine-Tuned Model (With Text Normalization)
A lightweight text normalization step was applied during evaluation to reduce superficial mismatches (punctuation, casing, spacing).
Per-corpus results
| Corpus | N | WER | CER |
|---|---|---|---|
| FalAI | 4776 | 0.0093 | 0.0033 |
| OpenSLR | 282 | 0.0648 | 0.0341 |
| FLEURS | 212 | 0.1709 | 0.0661 |
| Transcrispeech | 1710 | 0.1836 | 0.0697 |
Combined test set
| Dataset | N | WER | CER |
|---|---|---|---|
| TOTAL | 21543 | 0.0795 | 0.0256 |
Intended Use and Limitations
This model is intended for Galician ASR research and transcription workflows.
Performance may degrade on highly spontaneous speech or very noisy audio. As with all Whisper models, hallucinations may occur in low-signal segments.
The model is specialised for Galician transcription and is not intended for multilingual use or speech translation.
Contact information
For further information, send an email to proxecto.nos@usc.gal
Licensing information
Acknowledgements
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU).
Thanks also to Balidea for the technical development of this model.
Citation
@misc{proxectenos2026whisper-large-v3-turbo-gl-v1.0,
author = {{Proxecto Nós}},
title = {{Whisper Large-v3-Turbo} (Galician Fine-Tuned) },
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/proxectonos/whisper-large-v3-turbo-gl-v1.0/}},
}
- Downloads last month
- 9
Model tree for proxectonos/whisper-large-v3-turbo-gl-v1.0
Base model
openai/whisper-large-v3