Qwen2.5-Lumen-14B

Qwen direct preference optimization finetuned for ~3 epochs.

👁 wCcJkdrVDUH6m0AN9Lv3B~2.png

A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay.

Llama.cpp

(GGUF) Thanks QuantFactory

static - GGUF

(GGUF) Thanks mradermacher

static - GGUF
imatrix - GGUF

(GGUF) Thanks Triangle104

static - Q8_0 - Q6_K - Q5_K_M - Q5_K_S - Q5_0 - Q4_K_M - Q4_K_S - Q4_0

Other quant repositories also exist on huggingface and can be searched for.

Training Notes

Trained Qwen2.5-14B-Instruct for 2 epochs on NVidia A100, and on dataset jondurbin/gutenberg-dpo-v0.1, saving different checkpoints along the way (completely different runs at varying epochs and learning rates).

Tanliboy trained Qwen2.5-14B-Instruct for 1 epoch on HuggingFaceH4/ultrafeedback_binarized, (Credit to Tanliboy! Check out the model here)

Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).

Merge

Merged with a sophosympatheia's SLERP gradient "Ultrafeedback-Binarized DPO" and "Gutenberg DPO"
Merged with a sophosympatheia's SLERP gradient "Qwen2.5-14B-Instruct" and "Gutenberg DPO"
Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the most performant aspects of all runs/merges. Model Stock was chosen due to the similarity between the merged models.
This was chosen due to the fact that evaluation for ORPO is unclear, so it's hard to know which runs are the best.

One-Attempt generated example:

Temp 1.3 [1], Min_P 0.012 [4], TFS 0.97 [3], Smooth_Factor 0.3 [2], Smoothing_Curve 1.1, Rep 1.1, Rep Range 1000

Temp 1.3, Min_P 0.012, Rep 1.1

As you can see the model has mostly adapted to the intended response style from Gutenberg dataset.

Recipe

models:
 - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
 - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
 - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
 - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
 - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
 - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
 - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta
 - model: tanliboy/lambda-qwen2.5-14b-dpo-test
 - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
 - model: tanliboy/lambda-qwen2.5-14b-dpo-test
 - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
 - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
merge_method: model_stock
dtype: bfloat16