micro-glitter
This model is a fine-tuned version of unsloth/gemma-3-270m-it on the allura-org/EU01-S2, the allenai/tulu-3-sft-personas-instruction-following, the ToastyPigeon/mixed-medical-reasoning-formatted, the ToastyPigeon/steve-and-marvin, the ToastyPigeon/kimi-stories-instruct, the ToastyPigeon/new-story-dataset, the allura-org/fujin-instruct-v2, the ToastyPigeon/gutenberg-sft, the ToastyPigeon/SpringDragon and the ToastyPigeon/some-erotica datasets. It achieves the following results on the evaluation set:
- Loss: 3.7387
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 69
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 8
- training_steps: 296
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 3.8582 |
| 3.4802 | 0.1008 | 15 | 3.5118 |
| 3.4608 | 0.2017 | 30 | 3.4890 |
| 3.5272 | 0.3025 | 45 | 3.5189 |
| 3.559 | 0.4034 | 60 | 3.5753 |
| 3.5817 | 0.5042 | 75 | 3.6121 |
| 3.6349 | 0.6050 | 90 | 3.6471 |
| 3.68 | 0.7059 | 105 | 3.6721 |
| 3.6597 | 0.8067 | 120 | 3.6970 |
| 3.6462 | 0.9076 | 135 | 3.7068 |
| 3.7009 | 1.0067 | 150 | 3.7213 |
| 3.6717 | 1.1076 | 165 | 3.7313 |
| 3.7631 | 1.2084 | 180 | 3.7338 |
| 3.7535 | 1.3092 | 195 | 3.7346 |
| 3.668 | 1.4101 | 210 | 3.7375 |
| 3.679 | 1.5109 | 225 | 3.7383 |
| 3.6539 | 1.6118 | 240 | 3.7386 |
| 3.6547 | 1.7126 | 255 | 3.7386 |
| 3.7533 | 1.8134 | 270 | 3.7400 |
| 3.6983 | 1.9143 | 285 | 3.7387 |
Framework versions
- Transformers 4.52.4
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Model tree for allura-forge/micro-glitter
Base model
google/gemma-3-270m Finetuned
google/gemma-3-270m-it Finetuned
unsloth/gemma-3-270m-itQuantizations
2 models