Paper โข 2308.09583 โข Published โข 8
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct (RLEIF)
๐ Home Page
๐ค HF Repo โข๐ฑ Github Repo โข ๐ฆ Twitter
๐ [WizardLM] โข ๐ [WizardCoder] โข ๐ [WizardMath]
๐ Join our Discord
News
[12/19/2023] ๐ฅ We released WizardMath-7B-V1.1 trained from Mistral-7B, the SOTA 7B math LLM, achieves 83.2 pass@1 on GSM8k, and 33.0 pass@1 on MATH.
[12/19/2023] ๐ฅ WizardMath-7B-V1.1 outperforms ChatGPT 3.5, Gemini Pro, Mixtral MOE, and Claude Instant on GSM8K pass@1.
[12/19/2023] ๐ฅ WizardMath-7B-V1.1 is comparable with ChatGPT 3.5, Gemini Pro, and surpasses Mixtral MOE on MATH pass@1.
| Model | Checkpoint | Paper | GSM8k | MATH |
|---|---|---|---|---|
| WizardMath-7B-V1.1 | ๐ค HF Link | ๐ [WizardMath] | 83.2 | 33.0 |
| WizardMath-70B-V1.0 | ๐ค HF Link | ๐ [WizardMath] | 81.6 | 22.7 |
| WizardMath-13B-V1.0 | ๐ค HF Link | ๐ [WizardMath] | 63.9 | 14.0 |
| WizardMath-7B-V1.0 | ๐ค HF Link | ๐ [WizardMath] | 54.9 | 10.7 |
[12/19/2023] Comparing WizardMath-7B-V1.1 with other open source 7B size math LLMs.
| Model | GSM8k Pass@1 | MATH Pass@1 |
|---|---|---|
| MPT-7B | 6.8 | 3.0 |
| Llama 1-7B | 11.0 | 2.9 |
| Llama 2-7B | 12.3 | 2.8 |
| Yi-6b | 32.6 | 5.8 |
| Mistral-7B | 37.8 | 9.1 |
| Qwen-7b | 47.8 | 9.3 |
| RFT-7B | 50.3 | -- |
| MAmmoTH-7B (COT) | 50.5 | 10.4 |
| WizardMath-7B-V1.0 | 54.9 | 10.7 |
| Abel-7B-001 | 59.7 | 13 |
| MetaMath-7B | 66.5 | 19.8 |
| Arithmo-Mistral-7B | 74.7 | 25.3 |
| MetaMath-Mistral-7B | 77.7 | 28.2 |
| Abel-7B-002 | 80.4 | 29.5 |
| WizardMath-7B-V1.1 | 83.2 | 33.0 |
[12/19/2023] Comparing WizardMath-7B-V1.1 with large open source (30B~70B) LLMs.
| Model | GSM8k Pass@1 | MATH Pass@1 |
|---|---|---|
| Llemma-34B | 51.5 | 25.0 |
| Minerva-62B | 52.4 | 27.6 |
| Llama 2-70B | 56.8 | 13.5 |
| DeepSeek 67B | 63.4 | -- |
| Gork 33B | 62.9 | 23.9 |
| MAmmoTH-70B | 72.4 | 21.1 |
| Yi-34B | 67.9 | 15.9 |
| Mixtral 8x7B | 74.4 | 28.4 |
| MetaMath-70B | 82.3 | 26.6 |
| WizardMath-7B-V1.1 | 83.2 | 33.0 |
โ Data Contamination Check:
Before model training, we carefully and rigorously checked all the training data, and used multiple deduplication methods to verify and prevent data leakage on GSM8k and MATH test set.
- Downloads last month
- 548
