VOOZH about

URL: https://huggingface.co/NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

⇱ NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000 · Hugging Face


NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

Streaming North Sami (sme) automatic speech recognition model fine-tuned from nvidia/nemotron-3.5-asr-streaming-0.6b with NVIDIA NeMo.

Model details

  • Architecture: cache-aware streaming FastConformer-RNNT with language prompting
  • Base model: nvidia/nemotron-3.5-asr-streaming-0.6b
  • Training data: nb-asr-parakeet/data_v5
  • Spoken language: North Sami (sme)
  • Manifest target language: se-NO
  • NeMo artifact: model.nemo
  • Training run: olivia_nemotron35_asr_streaming_sme_lr0p30_steps2000000_h200x1-full100-bs64-bd420-bucket-val1k-seNO-fi26-noam030-wu1000_1230820
  • Selected checkpoint step: 239000
  • SHA256: debbd95ac34cb6e80f6bc849586a30a13b125b2b1903dd47e15c51d374dee08c
  • W&B run: main_1230820

Language prompt

se-NO is aliased to the pretrained Finnish prompt embedding (prompt ID 26).

This distinction matters when comparing this model with other checkpoints in the experiment: the Finnish-alias runs reuse a pretrained prompt embedding, while the fresh se-NO run learns a new prompt embedding from the North Sami fine-tuning data.

Tokenizer adaptation

The base tokenizer did not directly cover four North Sami characters. Existing token IDs were repurposed before fine-tuning:

Original token North Sami token Token ID
η ŋ 252
θ ŧ 684
Θ Ŧ 776
Η Ŋ 781

Use this packaged .nemo model directly so the matching tokenizer is restored with the acoustic model. Do not pair the checkpoint with the unmodified base tokenizer.

Training data

All accepted recordings were consumed at their original duration; no 30- or 40-second training cutoff was applied.

Split Samples Audio hours Longest sample
Train 244,296 390.107 38.430 s
Validation 856 1.399 28.000 s

The dataset may contain original and normalized variants of the same underlying recording. Consult the dataset documentation before using these figures for cross-corpus comparisons.

Fine-tuning configuration

Setting Value
Optimizer AdamW
Peak learning-rate parameter 0.3
Noam warmup 1000 steps
Requested epochs 100
Steps per full epoch 6288
Maximum batch size 64
Lhotse batch-duration budget 420.0
Quadratic-duration factor 15.0
Validation interval 1000 steps
Precision BF16
Hardware per run 1 NVIDIA GH200 GPU

max_steps was configured as a nonbinding safety ceiling. Epoch completion, rather than the step ceiling, defines the intended 100-pass training schedule.

Evaluation

The best validation checkpoint observed during this run had:

  • Validation WER: 0.288764
  • Epoch: 38
  • Global checkpoint step: 239000

WER is reported on the local North Sami validation split (856 utterances). It should not be compared directly with results using different normalization, segmentation, decoding, or validation data. Test-set evaluation is not claimed here.

Usage

This repository contains a NeMo .nemo archive rather than a Transformers model. Use the NeMo version compatible with Nemotron 3.5 ASR streaming models:

import json
import tempfile
from pathlib import Path

import soundfile as sf
from nemo.collections.asr.models.rnnt_bpe_models_prompt import EncDecRNNTBPEModelWithPrompt

model = EncDecRNNTBPEModelWithPrompt.restore_from("model.nemo")
model.eval()
model.set_inference_prompt("se-NO")

audio_path = Path("audio.wav").resolve()
with tempfile.TemporaryDirectory() as temporary_dir:
 manifest = Path(temporary_dir) / "input.jsonl"
 manifest.write_text(json.dumps({
 "audio_filepath": str(audio_path),
 "duration": sf.info(audio_path).duration,
 "text": "",
 "target_lang": "se-NO",
 "lang": "se-NO",
 }) + "\n")
 transcriptions = model.transcribe(
 [str(manifest)],
 batch_size=1,
 target_lang="se-NO",
 )

For streaming inference, retain the cache-aware streaming configuration embedded in the archive and follow the NeMo streaming ASR inference APIs.

Limitations

  • Intended for North Sami speech; behavior on other languages is not established.
  • The model may inherit errors and biases from the base model and fine-tuning data.
  • Accuracy can degrade with domain shift, noise, overlapping speakers, or uncommon dialects and orthography.
  • Validate outputs before use in high-impact or fully automated workflows.

License

The checkpoint is a derivative of nvidia/nemotron-3.5-asr-streaming-0.6b. Review the base model's license and terms together with the licenses and access conditions of the training data before redistribution or deployment.

Downloads last month
82

Model tree for NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

Finetuned
(24)
this model

Space using NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000 1

Collection including NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000