VOOZH about

URL: https://huggingface.co/proxectonos/whisper-large-v3-turbo-gl-v1.0

⇱ proxectonos/whisper-large-v3-turbo-gl-v1.0 · Hugging Face


Whisper Large-v3-Turbo (Galician Fine-Tuned)

This model is a fine-tuned version of openai/whisper-large-v3-turbo for automatic speech recognition (ASR) in Galician (gl).

It significantly improves transcription quality in Galician compared to the base model, especially on in-domain and low-resource speech data.

Training Data

The model was fine-tuned on a combined Galician ASR dataset built from multiple public and curated corpora.
All audio was normalised to 16 kHz and a homogeneous (audio, text) format.

Datasets Included

  • Common Voice v23 (Galician)
  • OpenSLR Speech Translation GL-EN (Galician side)
  • FLEURS GL-EN (Galician side)
  • FalAI (20% of validated split)
  • Transcrispeech (Galician)

These datasets cover clean read speech, semi-spontaneous speech and more challenging acoustic conditions.

Dataset Statistics

  • Train: 171,619 utterances
  • Validation: 22,209 utterances
  • Test: 21,542 utterances

Training Procedure

Fine-tuning was performed using the 🤗 Transformers Seq2SeqTrainer.

  • Effective batch size: 16
  • Learning rate: 1e-5
  • Warmup steps: 800
  • Max training steps: 53,630
  • Precision: FP16
  • Evaluation metric: Word Error Rate (WER)
  • Model selection: Best checkpoint selected based on validation WER

Audio features were extracted using WhisperFeatureExtractor, and text was tokenized with WhisperTokenizer configured for Galician transcription.

Evaluation Results

Evaluation was performed on held-out test splits for each corpus and on a combined test set.
Metrics are reported as WER (Word Error Rate) and CER (Character Error Rate).

Fine-Tuned Model (No Text Normalization)

Per-corpus results

Corpus N WER CER
FalAI 4776 0.0097 0.0034
CommonVoice 14563 0.0688 0.0153
OpenSLR 282 0.0808 0.0378
FLEURS 212 0.1980 0.0730
Transcrispeech 1710 0.2097 0.0770

Combined test set

Dataset N WER CER
TOTAL 21543 0.0962 0.0296

Fine-Tuned Model (With Text Normalization)

A lightweight text normalization step was applied during evaluation to reduce superficial mismatches (punctuation, casing, spacing).

Per-corpus results

Corpus N WER CER
FalAI 4776 0.0093 0.0033
OpenSLR 282 0.0648 0.0341
FLEURS 212 0.1709 0.0661
Transcrispeech 1710 0.1836 0.0697

Combined test set

Dataset N WER CER
TOTAL 21543 0.0795 0.0256

Intended Use and Limitations

This model is intended for Galician ASR research and transcription workflows.
Performance may degrade on highly spontaneous speech or very noisy audio. As with all Whisper models, hallucinations may occur in low-signal segments.

The model is specialised for Galician transcription and is not intended for multilingual use or speech translation.

Contact information

For further information, send an email to proxecto.nos@usc.gal

Licensing information

Apache License, Version 2.0

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU).

Thanks also to Balidea for the technical development of this model.

Citation

@misc{proxectenos2026whisper-large-v3-turbo-gl-v1.0,
 author = {{Proxecto Nós}},
 title = {{Whisper Large-v3-Turbo} (Galician Fine-Tuned) },
 year = {2026},
 publisher = {Hugging Face},
 howpublished = {\url{https://huggingface.co/proxectonos/whisper-large-v3-turbo-gl-v1.0/}},
}
Downloads last month
9
Safetensors
Model size
0.8B params
Tensor type
F32
·

Model tree for proxectonos/whisper-large-v3-turbo-gl-v1.0

Finetuned
(559)
this model

Datasets used to train proxectonos/whisper-large-v3-turbo-gl-v1.0

Collection including proxectonos/whisper-large-v3-turbo-gl-v1.0