Whisper Large-v3-Turbo (Galician Fine-Tuned)

This model is a fine-tuned version of openai/whisper-large-v3-turbo for automatic speech recognition (ASR) in Galician (gl).

It significantly improves transcription quality in Galician compared to the base model, especially on in-domain and low-resource speech data.

Training Data

The model was fine-tuned on a combined Galician ASR dataset built from multiple public and curated corpora.
All audio was normalised to 16 kHz and a homogeneous (audio, text) format.

Datasets Included

Common Voice v23 (Galician)
OpenSLR Speech Translation GL-EN (Galician side)
FLEURS GL-EN (Galician side)
FalAI (20% of validated split)
Transcrispeech (Galician)

These datasets cover clean read speech, semi-spontaneous speech and more challenging acoustic conditions.

Dataset Statistics

Train: 171,619 utterances
Validation: 22,209 utterances
Test: 21,542 utterances

Training Procedure

Fine-tuning was performed using the 🤗 Transformers Seq2SeqTrainer.

Effective batch size: 16
Learning rate: 1e-5
Warmup steps: 800
Max training steps: 53,630
Precision: FP16
Evaluation metric: Word Error Rate (WER)
Model selection: Best checkpoint selected based on validation WER

Audio features were extracted using WhisperFeatureExtractor, and text was tokenized with WhisperTokenizer configured for Galician transcription.

Evaluation Results

Evaluation was performed on held-out test splits for each corpus and on a combined test set.
Metrics are reported as WER (Word Error Rate) and CER (Character Error Rate).

Fine-Tuned Model (No Text Normalization)

Per-corpus results

Corpus	N	WER	CER
FalAI	4776	0.0097	0.0034
CommonVoice	14563	0.0688	0.0153
OpenSLR	282	0.0808	0.0378
FLEURS	212	0.1980	0.0730
Transcrispeech	1710	0.2097	0.0770

Combined test set

Dataset	N	WER	CER
TOTAL	21543	0.0962	0.0296

Fine-Tuned Model (With Text Normalization)

A lightweight text normalization step was applied during evaluation to reduce superficial mismatches (punctuation, casing, spacing).

Per-corpus results

Corpus	N	WER	CER
FalAI	4776	0.0093	0.0033
OpenSLR	282	0.0648	0.0341
FLEURS	212	0.1709	0.0661
Transcrispeech	1710	0.1836	0.0697

Combined test set

Dataset	N	WER	CER
TOTAL	21543	0.0795	0.0256

Intended Use and Limitations

This model is intended for Galician ASR research and transcription workflows.
Performance may degrade on highly spontaneous speech or very noisy audio. As with all Whisper models, hallucinations may occur in low-signal segments.

The model is specialised for Galician transcription and is not intended for multilingual use or speech translation.

Contact information

For further information, send an email to proxecto.nos@usc.gal

Licensing information

Apache License, Version 2.0

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU).

Thanks also to Balidea for the technical development of this model.

Citation

@misc{proxectenos2026whisper-large-v3-turbo-gl-v1.0,
 author = {{Proxecto Nós}},
 title = {{Whisper Large-v3-Turbo} (Galician Fine-Tuned) },
 year = {2026},
 publisher = {Hugging Face},
 howpublished = {\url{https://huggingface.co/proxectonos/whisper-large-v3-turbo-gl-v1.0/}},
}

Downloads last month: 9

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for proxectonos/whisper-large-v3-turbo-gl-v1.0

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo