Paper • 2206.08317 • Published
⭐ Powered by FunASR — please give us a GitHub Star!
This model is part of the FunASR ecosystem — one industrial-grade open-source toolkit for ASR · VAD · punctuation · speaker diarization · emotion / event · LLM-ASR. A Star really helps the project (and keeps you updated):
Paraformer-zh
Non-autoregressive end-to-end speech recognition — 120x realtime on GPU, production-ready for Mandarin Chinese.
Paraformer is a non-autoregressive (NAR) ASR model that generates the entire output in parallel, achieving significant speedups over autoregressive models like Whisper while maintaining competitive accuracy.
Quick Start
from funasr import AutoModel
# Basic recognition
model = AutoModel(model="funasr/paraformer-zh", hub="hf", device="cuda")
result = model.generate(input="audio.wav")
print(result[0]["text"])
Full Pipeline (VAD + ASR + Punctuation + Speaker Diarization)
from funasr import AutoModel
model = AutoModel(
model="funasr/paraformer-zh",
hub="hf",
vad_model="funasr/fsmn-vad",
punc_model="funasr/ct-punc",
spk_model="funasr/campplus",
device="cuda",
)
result = model.generate(input="meeting.wav")
# Output includes timestamps, punctuation, and speaker labels
for sentence in result[0]["sentence_info"]:
print(f"[Speaker {sentence['spk']}] {sentence['text']}")
Features
- 120x realtime on GPU (non-autoregressive parallel decoding)
- Chinese + English mixed recognition
- Built-in VAD (voice activity detection) for long audio
- Punctuation restoration with ct-punc model
- Speaker diarization with cam++ model
- Streaming and offline modes
- ONNX export supported
Model Details
| Property | Value |
|---|---|
| Architecture | Paraformer (Non-autoregressive) |
| Parameters | 220M |
| Languages | Chinese, English |
| Sample Rate | 16kHz |
| Training Data | 60,000+ hours |
Related Models
| Model | Description | Link |
|---|---|---|
| funasr/fsmn-vad | Voice Activity Detection | HF |
| funasr/ct-punc | Punctuation Restoration | HF |
| funasr/campplus | Speaker Verification | HF |
| funasr/paraformer-zh-streaming | Streaming version | HF |
Links
- GitHub: FunASR
- Docs: modelscope.github.io/FunASR
- Paper: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Citation
@inproceedings{gao2022paraformer,
title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
booktitle={INTERSPEECH},
year={2022}
}
- Downloads last month
- 1,660
Model tree for funasr/paraformer-zh
Quantizations
1 model