TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages. In general distillation, we use the original BERT-base without fine-tuning as the teacher and a large-scale text corpus as the learning data. By performing the Transformer distillation on the text from general domain, we obtain a general TinyBERT which provides a good initialization for the task-specific distillation. We here provide the general TinyBERT for your tasks at hand.

For more details about the techniques of TinyBERT, refer to our paper: TinyBERT: Distilling BERT for Natural Language Understanding

Citation

If you find TinyBERT useful in your research, please cite the following paper:

@article{jiao2019tinybert,
 title={Tinybert: Distilling bert for natural language understanding},
 author={Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
 journal={arXiv preprint arXiv:1909.10351},
 year={2019}
}

Downloads last month: 176,063

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for huawei-noah/TinyBERT_General_4L_312D

Finetunes

58 models

Quantizations

6 models

Spaces using huawei-noah/TinyBERT_General_4L_312D 14

Paper for huawei-noah/TinyBERT_General_4L_312D

Paper • 1909.10351 • Published Sep 23, 2019 • 4

URL: https://huggingface.co/huawei-noah/TinyBERT_General_4L_312D

⇱ huawei-noah/TinyBERT_General_4L_312D · Hugging Face

TinyBERT: Distilling BERT for Natural Language Understanding

Citation

Model tree for huawei-noah/TinyBERT_General_4L_312D

Spaces using huawei-noah/TinyBERT_General_4L_312D 14

Paper for huawei-noah/TinyBERT_General_4L_312D