You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

FastConformer-Quran CoreML — Offline

iOS/macOS offline (full-utterance) Quranic ASR. Two variants — pick by your need:

Variant	File	Precision	Device	WER*	When to use
Max accuracy	`fastconformer-quran-offline.mlpackage`	fp32	GPU/CPU	~3%	best transcript, latency not critical
ANE (real-time)	`fastconformer-quran-offline-ane.mlpackage`	fp16	Neural Engine	~6%	on-device, low-latency, battery-friendly

*WER on a leakage-free held-out Quran set (EveryAyah reciters never trained on + QUL + real-phone tlog). For reference, the public Arabic-ASR-leaderboard #1 (nvidia FastConformer) scores ~5.7% on the same clips — both our variants are competitive, the fp32 one better.

Why two variants (the ANE precision story)

The Neural Engine is fp16-only, and the full-attention encoder blanks on ANE in fp16 (rounding accumulates over the whole utterance and erodes thin CTC margins → empty output on inputs like the Basmala). That can't be fixed by precision/quantization tricks. The fix is architectural: the ANE variant uses limited windowed attention att_context_size=[32,32] (each frame attends ±32 frames ≈ 6.5 s), which bounds the accumulation so it survives fp16 on ANE. It's fine-tuned at that window, costs a few WER points vs full attention, but runs correctly and real-time on ANE (and beats our cache-aware streaming model: ~6% vs ~10% WER on the same held-out set).

So: want maximum accuracy → fp32 (GPU). Want ANE acceleration → windowed fp16. Full attention on ANE is physically impossible (fp16 + whole-utterance accumulation → blank).

Both are multi-function (7 ANE entry points)

predict_T80 … predict_T4800 (0.8 s … 48 s), pick by padded audio length. Input (1,80,T) → logprobs (1,T/8,1025) + encoder_output. pos_enc clamp baked in.

Streaming variant

For live/low-latency recitation use the cache-aware Muno459/fastconformer-quran-coreml-streaming (validated on ANE, 5–8 ms/chunk).

License

Apache 2.0 (NVIDIA FastConformer-Hybrid + Muno459/fastconformer-quran upstream).

Benchmark

Leakage-free held-out WER vs nvidia / whisper / seamless / mms / omniASR / Tarteel: Quranic ASR Leaderboard.

Downloads last month: 104

Model tree for Muno459/fastconformer-quran-coreml-offline

Base model

nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

Quantized

Muno459/fastconformer-quran

Quantized

(3)

this model

URL: https://huggingface.co/Muno459/fastconformer-quran-coreml-offline

⇱ Muno459/fastconformer-quran-coreml-offline · Hugging Face