Whisper-large-v3-vaani-hindi

This is a fine-tuned version of OpenAI's Whisper-Medium, trained on approximately 718 hours of transcribed Hindi speech from multiple datasets.

Usage

This can be used with the pipeline function from the Transformers module.


import torch
from transformers import pipeline

audio = "path to the audio file to be transcribed"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
modelTags="ARTPARK-IISc/whisper-medium-vaani-hindi"
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

print('Transcription: ', transcribe(audio)["text"])

Training and Evaluation

The models has finetuned using folllowing dataset Vaani ,Gramvaani IndicVoices, Fleurs,IndicTTS and Commonvoice

The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.

Dataset	WER
Gramvaani	27.64
Fleurs	14.34
IndicTTS	07.78
MUCS	23.46
Commonvoice	19.90
Kathbath	14.29
Kathbath Noisy	16.03
Vaani	25.48
RESPIN	08.79

Downloads last month: 189

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for ARTPARK-IISc/whisper-medium-vaani-hindi

Base model

openai/whisper-medium

Finetuned

(885)

this model

Dataset used to train ARTPARK-IISc/whisper-medium-vaani-hindi

Collection including ARTPARK-IISc/whisper-medium-vaani-hindi

A collection of whisper models fine tuned using Vaani data along with other datasets • 11 items • Updated Jan 5 • 7