A collection of whisper models fine tuned using Vaani data along with other datasets • 11 items • Updated • 7
Whisper-large-v3-vaani-hindi
This is a fine-tuned version of OpenAI's Whisper-Medium, trained on approximately 718 hours of transcribed Hindi speech from multiple datasets.
Usage
This can be used with the pipeline function from the Transformers module.
import torch
from transformers import pipeline
audio = "path to the audio file to be transcribed"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
modelTags="ARTPARK-IISc/whisper-medium-vaani-hindi"
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")
print('Transcription: ', transcribe(audio)["text"])
Training and Evaluation
The models has finetuned using folllowing dataset Vaani ,Gramvaani IndicVoices, Fleurs,IndicTTS and Commonvoice
The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.
| Dataset | WER |
|---|---|
| Gramvaani | 27.64 |
| Fleurs | 14.34 |
| IndicTTS | 07.78 |
| MUCS | 23.46 |
| Commonvoice | 19.90 |
| Kathbath | 14.29 |
| Kathbath Noisy | 16.03 |
| Vaani | 25.48 |
| RESPIN | 08.79 |
- Downloads last month
- 189
Safetensors
Model size
0.8B params
Tensor type
F32
·
Model tree for ARTPARK-IISc/whisper-medium-vaani-hindi
Base model
openai/whisper-medium