VOOZH about

URL: https://huggingface.co/Muno459/fastconformer-quran-coreml-offline

⇱ Muno459/fastconformer-quran-coreml-offline · Hugging Face


You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

FastConformer-Quran CoreML — Offline

iOS/macOS offline (full-utterance) Quranic ASR. Two variants — pick by your need:

Variant File Precision Device WER* When to use
Max accuracy fastconformer-quran-offline.mlpackage fp32 GPU/CPU ~3% best transcript, latency not critical
ANE (real-time) fastconformer-quran-offline-ane.mlpackage fp16 Neural Engine ~6% on-device, low-latency, battery-friendly

*WER on a leakage-free held-out Quran set (EveryAyah reciters never trained on + QUL + real-phone tlog). For reference, the public Arabic-ASR-leaderboard #1 (nvidia FastConformer) scores ~5.7% on the same clips — both our variants are competitive, the fp32 one better.

Why two variants (the ANE precision story)

The Neural Engine is fp16-only, and the full-attention encoder blanks on ANE in fp16 (rounding accumulates over the whole utterance and erodes thin CTC margins → empty output on inputs like the Basmala). That can't be fixed by precision/quantization tricks. The fix is architectural: the ANE variant uses limited windowed attention att_context_size=[32,32] (each frame attends ±32 frames ≈ 6.5 s), which bounds the accumulation so it survives fp16 on ANE. It's fine-tuned at that window, costs a few WER points vs full attention, but runs correctly and real-time on ANE (and beats our cache-aware streaming model: ~6% vs ~10% WER on the same held-out set).

So: want maximum accuracy → fp32 (GPU). Want ANE acceleration → windowed fp16. Full attention on ANE is physically impossible (fp16 + whole-utterance accumulation → blank).

Both are multi-function (7 ANE entry points)

predict_T80 … predict_T4800 (0.8 s … 48 s), pick by padded audio length. Input (1,80,T)logprobs (1,T/8,1025) + encoder_output. pos_enc clamp baked in.

Streaming variant

For live/low-latency recitation use the cache-aware Muno459/fastconformer-quran-coreml-streaming (validated on ANE, 5–8 ms/chunk).

License

Apache 2.0 (NVIDIA FastConformer-Hybrid + Muno459/fastconformer-quran upstream).

Benchmark

Leakage-free held-out WER vs nvidia / whisper / seamless / mms / omniASR / Tarteel: Quranic ASR Leaderboard.

Downloads last month
104

Model tree for Muno459/fastconformer-quran-coreml-offline

Datasets used to train Muno459/fastconformer-quran-coreml-offline

Space using Muno459/fastconformer-quran-coreml-offline 1