DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on the kanhatakeyama/ramdom-to-fixed-multiturn-Calm3, the Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered, the Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted, the Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered, the Aratako/Open-Platypus-Japanese-masked-formatted, the kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja, the Aratako/magpie-ultra-v0.1-formatted, the Aratako/orca-agentinstruct-1M-v1-selected and the Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k datasets. It achieves the following results on the evaluation set: - Loss: 0.6154

以下、Axolotlの実行コード

!apt-get update
!apt-get install -y libopenmpi-dev

!git clone https://github.com/axolotl-ai-cloud/axolotl

cd axolotl
!pip install -e .
!pip install packaging ninja
!pip install flash-attn
!pip install deepspeed
!pip install mpi4py

# write権限のあるtokenを利用してHFにログイン（学習後のモデルアップロードに必要）
!huggingface-cli login --token WRITE ME
# wandbにログイン（wandbに学習ログを残したい場合）
!wandb login WRITE ME

import axolotl

!python -m axolotl.cli.preprocess /workspace/deepseek-32b-ver001-simpo.yml --debug

! accelerate launch -m axolotl.cli.train /workspace/deepseek-32b-ver001-simpo.yml --deepspeed deepspeed_configs/zero3_bf16.json

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.0196	0.0008	1	0.9386
0.732	0.0381	50	0.7104
0.7803	0.0763	100	0.6853
0.6013	0.1144	150	0.6712
0.6767	0.1526	200	0.6628
0.701	0.1907	250	0.6565
0.6976	0.2289	300	0.6520
0.7022	0.2670	350	0.6487
0.6889	0.3051	400	0.6449
0.6673	0.3433	450	0.6411
0.6067	0.3814	500	0.6382
0.644	0.4196	550	0.6357
0.9572	0.4577	600	0.6336
0.6466	0.4959	650	0.6310
0.6781	0.5340	700	0.6291
0.6473	0.5721	750	0.6274
0.6235	0.6103	800	0.6255
0.6564	0.6484	850	0.6238
0.6009	0.6866	900	0.6221
0.5759	0.7247	950	0.6208
0.5817	0.7628	1000	0.6197
0.6438	0.8010	1050	0.6190
0.6102	0.8391	1100	0.6180
0.5997	0.8773	1150	0.6170
0.5896	0.9154	1200	0.6164
0.5713	0.9536	1250	0.6158
0.6164	0.9917	1300	0.6154

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Adapter

(220)

this model

URL: https://huggingface.co/kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

⇱ kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0 · Hugging Face

DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

以下、Axolotlの実行コード

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

Datasets used to train kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0