Voozh

VOOZH

URL: https://huggingface.co/datasets/MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5

⇱ MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5 · Datasets at Hugging Face

Dataset Viewer

The dataset viewer is not available because its heuristics could not detect any supported data files. You can try uploading some data files, or configuring the data files location manually.

This repository contains the refined pre-training corpus from the paper MiniPLM: Knowledge Distillation for Pre-Training Language Models.

Code: https://github.com/thu-coai/MiniPLM

Downloads last month: 282

Models trained or fine-tuned on MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5

0.2B • Updated Feb 24, 2025 • 213

0.5B • Updated Mar 13, 2025 • 195

Text Generation • 0.5B • Updated Mar 25, 2025 • 146 • • 7

1B • Updated Mar 13, 2025 • 133

Text Generation • 0.2B • Updated Jan 27 • 131 • 1

0.1B • Updated Oct 28, 2024 • 105

Browse 14 models trained on this dataset

Paper for MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5

Paper • 2410.17215 • Published Oct 22, 2024 • 16