NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

Streaming North Sami (sme) automatic speech recognition model fine-tuned from nvidia/nemotron-3.5-asr-streaming-0.6b with NVIDIA NeMo.

Model details

Architecture: cache-aware streaming FastConformer-RNNT with language prompting
Base model: nvidia/nemotron-3.5-asr-streaming-0.6b
Training data: nb-asr-parakeet/data_v5
Spoken language: North Sami (sme)
Manifest target language: se-NO
NeMo artifact: model.nemo
Training run: olivia_nemotron35_asr_streaming_sme_lr0p30_steps2000000_h200x1-full100-bs64-bd420-bucket-val1k-seNO-fi26-noam030-wu1000_1230820
Selected checkpoint step: 239000
SHA256: debbd95ac34cb6e80f6bc849586a30a13b125b2b1903dd47e15c51d374dee08c
W&B run: main_1230820

Language prompt

se-NO is aliased to the pretrained Finnish prompt embedding (prompt ID 26).

This distinction matters when comparing this model with other checkpoints in the experiment: the Finnish-alias runs reuse a pretrained prompt embedding, while the fresh se-NO run learns a new prompt embedding from the North Sami fine-tuning data.

Tokenizer adaptation

The base tokenizer did not directly cover four North Sami characters. Existing token IDs were repurposed before fine-tuning:

Original token	North Sami token	Token ID
`η`	`ŋ`	`252`
`θ`	`ŧ`	`684`
`Θ`	`Ŧ`	`776`
`Η`	`Ŋ`	`781`

Use this packaged .nemo model directly so the matching tokenizer is restored with the acoustic model. Do not pair the checkpoint with the unmodified base tokenizer.

Training data

All accepted recordings were consumed at their original duration; no 30- or 40-second training cutoff was applied.

Split	Samples	Audio hours	Longest sample
Train	244,296	390.107	38.430 s
Validation	856	1.399	28.000 s

The dataset may contain original and normalized variants of the same underlying recording. Consult the dataset documentation before using these figures for cross-corpus comparisons.

Fine-tuning configuration

Setting	Value
Optimizer	AdamW
Peak learning-rate parameter	0.3
Noam warmup	1000 steps
Requested epochs	100
Steps per full epoch	6288
Maximum batch size	64
Lhotse batch-duration budget	420.0
Quadratic-duration factor	15.0
Validation interval	1000 steps
Precision	BF16
Hardware per run	1 NVIDIA GH200 GPU

max_steps was configured as a nonbinding safety ceiling. Epoch completion, rather than the step ceiling, defines the intended 100-pass training schedule.

Evaluation

The best validation checkpoint observed during this run had:

Validation WER: 0.288764
Epoch: 38
Global checkpoint step: 239000

WER is reported on the local North Sami validation split (856 utterances). It should not be compared directly with results using different normalization, segmentation, decoding, or validation data. Test-set evaluation is not claimed here.

Usage

This repository contains a NeMo .nemo archive rather than a Transformers model. Use the NeMo version compatible with Nemotron 3.5 ASR streaming models:

import json
import tempfile
from pathlib import Path

import soundfile as sf
from nemo.collections.asr.models.rnnt_bpe_models_prompt import EncDecRNNTBPEModelWithPrompt

model = EncDecRNNTBPEModelWithPrompt.restore_from("model.nemo")
model.eval()
model.set_inference_prompt("se-NO")

audio_path = Path("audio.wav").resolve()
with tempfile.TemporaryDirectory() as temporary_dir:
 manifest = Path(temporary_dir) / "input.jsonl"
 manifest.write_text(json.dumps({
 "audio_filepath": str(audio_path),
 "duration": sf.info(audio_path).duration,
 "text": "",
 "target_lang": "se-NO",
 "lang": "se-NO",
 }) + "\n")
 transcriptions = model.transcribe(
 [str(manifest)],
 batch_size=1,
 target_lang="se-NO",
 )

For streaming inference, retain the cache-aware streaming configuration embedded in the archive and follow the NeMo streaming ASR inference APIs.

Limitations

Intended for North Sami speech; behavior on other languages is not established.
The model may inherit errors and biases from the base model and fine-tuning data.
Accuracy can degrade with domain shift, noise, overlapping speakers, or uncommon dialects and orthography.
Validate outputs before use in high-impact or fully automated workflows.

License

The checkpoint is a derivative of nvidia/nemotron-3.5-asr-streaming-0.6b. Review the base model's license and terms together with the licenses and access conditions of the training data before redistribution or deployment.

Downloads last month: 82

Model tree for NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

Base model

nvidia/nemotron-3.5-asr-streaming-0.6b

Finetuned

(24)

this model

Space using NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000 1

Collection including NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000

20 items • Updated 10 days ago

URL: https://huggingface.co/NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000