whisper-large-v3-te
Telugu ASR model fine-tuned from openai/whisper-large-v3 by Liodon AI.
Note: This is the epoch 1 checkpoint (training complete). Further training with additional data is in progress — this model will be updated with improved checkpoints as training continues.
Training Data
~119K Telugu audio samples from three datasets:
| Dataset | Split | Size |
|---|---|---|
| ai4bharat/Kathbath | train | ~70K |
| ai4bharat/indicvoices_r | train | ~47K |
| google/fleurs (te_in) | train | ~2K |
Training Details
- Base model: openai/whisper-large-v3
- Hardware: NVIDIA GB10 (Grace Hopper), 128GB unified memory
- Batch size: 16
- Learning rate: 1e-5
- Precision: bf16
- Epochs: 1
WER Progress (Kathbath valid)
| Epoch | WER |
|---|---|
| 0.10 | 48.79% |
| 0.20 | 43.35% |
| 0.30 | 40.78% |
| 0.40 | 39.61% |
| 0.50 | 38.96% |
| 0.60 | 38.51% |
| 0.70 | 38.31% |
| 0.80 | 38.17% |
| 0.90 | 38.16% |
| 1.00 | 38.15% |
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
model = WhisperForConditionalGeneration.from_pretrained(
"liodon-ai/whisper-large-v3-te",
torch_dtype=torch.float16,
)
processor = WhisperProcessor.from_pretrained(
"liodon-ai/whisper-large-v3-te",
language="Telugu",
task="transcribe",
)
# Load your audio (must be 16kHz mono)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
License
Apache 2.0
- Downloads last month
- 83
Safetensors
Model size
2B params
Tensor type
BF16
·
Datasets used to train liodon-ai/whisper-large-v3-te
Space using liodon-ai/whisper-large-v3-te 1
Evaluation results
- WER on Kathbath (Telugu validation)self-reported38.150
