FastConformer-Quran CoreML — Offline
iOS/macOS offline (full-utterance) Quranic ASR. Two variants — pick by your need:
| Variant | File | Precision | Device | WER* | When to use |
|---|---|---|---|---|---|
| Max accuracy | fastconformer-quran-offline.mlpackage |
fp32 | GPU/CPU | ~3% | best transcript, latency not critical |
| ANE (real-time) | fastconformer-quran-offline-ane.mlpackage |
fp16 | Neural Engine | ~6% | on-device, low-latency, battery-friendly |
*WER on a leakage-free held-out Quran set (EveryAyah reciters never trained on + QUL + real-phone tlog). For reference, the public Arabic-ASR-leaderboard #1 (nvidia FastConformer) scores ~5.7% on the same clips — both our variants are competitive, the fp32 one better.
Why two variants (the ANE precision story)
The Neural Engine is fp16-only, and the full-attention encoder blanks on ANE in fp16 (rounding accumulates over the whole utterance and erodes thin CTC margins → empty output on inputs like the Basmala). That can't be fixed by precision/quantization tricks. The fix is architectural: the ANE variant uses limited windowed attention att_context_size=[32,32] (each frame attends ±32 frames ≈ 6.5 s), which bounds the accumulation so it survives fp16 on ANE. It's fine-tuned at that window, costs a few WER points vs full attention, but runs correctly and real-time on ANE (and beats our cache-aware streaming model: ~6% vs ~10% WER on the same held-out set).
So: want maximum accuracy → fp32 (GPU). Want ANE acceleration → windowed fp16. Full attention on ANE is physically impossible (fp16 + whole-utterance accumulation → blank).
Both are multi-function (7 ANE entry points)
predict_T80 … predict_T4800 (0.8 s … 48 s), pick by padded audio length. Input (1,80,T) → logprobs (1,T/8,1025) + encoder_output. pos_enc clamp baked in.
Streaming variant
For live/low-latency recitation use the cache-aware Muno459/fastconformer-quran-coreml-streaming (validated on ANE, 5–8 ms/chunk).
License
Apache 2.0 (NVIDIA FastConformer-Hybrid + Muno459/fastconformer-quran upstream).
Benchmark
Leakage-free held-out WER vs nvidia / whisper / seamless / mms / omniASR / Tarteel: Quranic ASR Leaderboard.
- Downloads last month
- 104
