VOOZH about

URL: https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2

⇱ EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 · Hugging Face


EVA Qwen2.5-72B v0.2

A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.

Dedicated to Nev.

NOTE: LLM-Compressor quants don't seem to work correctly, quality seems to be much worse than normal. It wasn't the case with previous versions. GGUF and GPTQ seem to be unaffected.


Version notes for 0.2: Optimized training hyperparameters and increased sequence length. Better instruction following deeper into context and less repetition.

Prompt format is ChatML.


Recommended sampler values:

  • Temperature: 0.8
  • Min-P: 0.05
  • Top-A: 0.3
  • Repetition Penalty: 1.03

Recommended SillyTavern preset (via CalamitousFelicitousness):


Training data:

  • Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
  • Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
  • A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
  • A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
  • Synthstruct and SynthRP datasets by Epiculous
  • A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.

Training time and hardware:

  • 17 hours on 8xH100 SXM

Model was created by Kearm, Auri and Cahvay.

Special thanks:

  • to Featherless for sponsoring this run
  • to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.
  • to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
  • and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.

Statement about change in licensing for the future models.

For all future EVA-Unit-01 models, there will be a provision in the license stating that Infermatic and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models. While this cannot retroactively apply to our licensing, we officially request Infermatic immediately cease use of our models for unwarranted profit, although we acknowledge at this point it will not likely be followed. EVA models will still be available in the future on Featherless, ArliAI (in the future), and other providers who want to host them, as well as for local and cloud usage.

👁 Built with Axolotl


Open LLM Leaderboard Evaluation Results

Metric Value
Avg. 43.54
IFEval (0-Shot) 68.79
BBH (3-Shot) 59.07
MATH Lvl 5 (4-Shot) 39.05
GPQA (0-shot) 21.14
MuSR (0-shot) 19.73
MMLU-PRO (5-shot) 53.48
Downloads last month
30
Safetensors
Model size
73B params
Tensor type
BF16
·

Model tree for EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2

Base model

Qwen/Qwen2.5-72B
Finetuned
(66)
this model
Finetunes
1 model
Merges
16 models
Quantizations
5 models

Datasets used to train EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2

Spaces using EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 17

Collection including EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2