Qwen2.5-1.75B-A1.1B-Instruct-ja

Qwen2.5-0.5B系のモデルを組み合わせて作ったMoEです。

Details

https://zenn.dev/kendama/articles/68ae234e9371ac

Qwen2.5-4x0.5B-sft-v1

This model is a fine-tuned version of Kendamarron/Qwen2.5-4x0.5B-cpt on the Kendamarron/jimba-instruction-all, the Kendamarron/OpenMathInstruct-2-ja-CoT-only_thought, the Aratako/Synthetic-JP-EN-Coding-Dataset-801k and the llm-jp/magpie-sft-v1.0 datasets. It achieves the following results on the evaluation set:

Loss: 1.0085

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 60
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
1.3068	0.0033	1	1.3071
1.1087	0.3309	100	1.0806
1.1393	0.6617	200	1.0488
1.0569	0.9926	300	1.0286
0.9902	1.3209	400	1.0215
0.9933	1.6518	500	1.0133
0.9706	1.9826	600	1.0085

Framework versions

Transformers 4.47.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.21.0

Downloads last month: 6

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Kendamarron/Qwen2.5-1.75B-A1.1B-Instruct-ja

Base model

Kendamarron/Qwen2.5-4x0.5B-cpt

Finetuned

(1)

this model

URL: https://huggingface.co/Kendamarron/Qwen2.5-1.75B-A1.1B-Instruct-ja

⇱ Kendamarron/Qwen2.5-1.75B-A1.1B-Instruct-ja · Hugging Face

Qwen2.5-1.75B-A1.1B-Instruct-ja

Details

Qwen2.5-4x0.5B-sft-v1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kendamarron/Qwen2.5-1.75B-A1.1B-Instruct-ja

Datasets used to train Kendamarron/Qwen2.5-1.75B-A1.1B-Instruct-ja