NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000
Streaming North Sami (sme) automatic speech recognition model fine-tuned from
nvidia/nemotron-3.5-asr-streaming-0.6b with NVIDIA NeMo.
Model details
- Architecture: cache-aware streaming FastConformer-RNNT with language prompting
- Base model:
nvidia/nemotron-3.5-asr-streaming-0.6b - Training data:
nb-asr-parakeet/data_v5 - Spoken language: North Sami (
sme) - Manifest target language:
se-NO - NeMo artifact:
model.nemo - Training run:
olivia_nemotron35_asr_streaming_sme_lr0p30_steps2000000_h200x1-full100-bs64-bd420-bucket-val1k-seNO-fi26-noam030-wu1000_1230820 - Selected checkpoint step:
239000 - SHA256:
debbd95ac34cb6e80f6bc849586a30a13b125b2b1903dd47e15c51d374dee08c - W&B run: main_1230820
Language prompt
se-NO is aliased to the pretrained Finnish prompt embedding (prompt ID 26).
This distinction matters when comparing this model with other checkpoints in the
experiment: the Finnish-alias runs reuse a pretrained prompt embedding, while the
fresh se-NO run learns a new prompt embedding from the North Sami fine-tuning data.
Tokenizer adaptation
The base tokenizer did not directly cover four North Sami characters. Existing token IDs were repurposed before fine-tuning:
| Original token | North Sami token | Token ID |
|---|---|---|
η |
ŋ |
252 |
θ |
ŧ |
684 |
Θ |
Ŧ |
776 |
Η |
Ŋ |
781 |
Use this packaged .nemo model directly so the matching tokenizer is restored with
the acoustic model. Do not pair the checkpoint with the unmodified base tokenizer.
Training data
All accepted recordings were consumed at their original duration; no 30- or 40-second training cutoff was applied.
| Split | Samples | Audio hours | Longest sample |
|---|---|---|---|
| Train | 244,296 | 390.107 | 38.430 s |
| Validation | 856 | 1.399 | 28.000 s |
The dataset may contain original and normalized variants of the same underlying recording. Consult the dataset documentation before using these figures for cross-corpus comparisons.
Fine-tuning configuration
| Setting | Value |
|---|---|
| Optimizer | AdamW |
| Peak learning-rate parameter | 0.3 |
| Noam warmup | 1000 steps |
| Requested epochs | 100 |
| Steps per full epoch | 6288 |
| Maximum batch size | 64 |
| Lhotse batch-duration budget | 420.0 |
| Quadratic-duration factor | 15.0 |
| Validation interval | 1000 steps |
| Precision | BF16 |
| Hardware per run | 1 NVIDIA GH200 GPU |
max_steps was configured as a nonbinding safety ceiling. Epoch completion, rather
than the step ceiling, defines the intended 100-pass training schedule.
Evaluation
The best validation checkpoint observed during this run had:
- Validation WER: 0.288764
- Epoch: 38
- Global checkpoint step: 239000
WER is reported on the local North Sami validation split (856 utterances). It should not be compared directly with results using different normalization, segmentation, decoding, or validation data. Test-set evaluation is not claimed here.
Usage
This repository contains a NeMo .nemo archive rather than a Transformers model.
Use the NeMo version compatible with Nemotron 3.5 ASR streaming models:
import json
import tempfile
from pathlib import Path
import soundfile as sf
from nemo.collections.asr.models.rnnt_bpe_models_prompt import EncDecRNNTBPEModelWithPrompt
model = EncDecRNNTBPEModelWithPrompt.restore_from("model.nemo")
model.eval()
model.set_inference_prompt("se-NO")
audio_path = Path("audio.wav").resolve()
with tempfile.TemporaryDirectory() as temporary_dir:
manifest = Path(temporary_dir) / "input.jsonl"
manifest.write_text(json.dumps({
"audio_filepath": str(audio_path),
"duration": sf.info(audio_path).duration,
"text": "",
"target_lang": "se-NO",
"lang": "se-NO",
}) + "\n")
transcriptions = model.transcribe(
[str(manifest)],
batch_size=1,
target_lang="se-NO",
)
For streaming inference, retain the cache-aware streaming configuration embedded in the archive and follow the NeMo streaming ASR inference APIs.
Limitations
- Intended for North Sami speech; behavior on other languages is not established.
- The model may inherit errors and biases from the base model and fine-tuning data.
- Accuracy can degrade with domain shift, noise, overlapping speakers, or uncommon dialects and orthography.
- Validate outputs before use in high-impact or fully automated workflows.
License
The checkpoint is a derivative of nvidia/nemotron-3.5-asr-streaming-0.6b. Review the base model's
license and terms together with the licenses and access conditions of the training
data before redistribution or deployment.
- Downloads last month
- 82
Model tree for NbAiLab/nb-sami-asr-north-nemotron-lr030-step239000
Base model
nvidia/nemotron-3.5-asr-streaming-0.6b