BeaverAI/MN-2407-DSK-QwQify-v0.1-12B
Test model to try to give an existing model QwQ's thoughts. For this first version it is ontop of PocketDoc/Dans-SakuraKaze-V1.0.0-12b (an rp/adventure/co-writing model), which was trained ontop of PocketDoc/Dans-PersonalityEngine-V1.1.0-12b (a jack of all trades instruct model), which was trained ontop of mistralai/Mistral-Nemo-Base-2407.
The prompt formatting and usage should be the same as with QwQ; Use ChatML, and remove the thinking from previous turns. If thoughts arent being generated automatically, add <think>\n to the start of the assistant turn.
It should follow previous model turns formatting. On first turns of the conversation you may need to regen a few times, and maybe edit the model responses for the first few turns to get it to your liking.
You may want to disable inserting {{char}}: prefix for the character, and instead add something like Only speak as "{{char}}" in conversation with "{{user}}". Output your final response with a "{{char}}:" prefix. to the end of you system prompt.
MN-2407-DSK-QwQify-v0.1-12B-LoRA-WS
This model is a fine-tuned version of PocketDoc/Dans-SakuraKaze-V1.0.0-12b on the PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled and the PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled datasets. It achieves the following results on the evaluation set:
- Loss: 1.2770
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 2.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.134 | 0.0038 | 1 | 2.0025 |
| 1.6185 | 0.0951 | 25 | 1.5748 |
| 1.5187 | 0.1901 | 50 | 1.4871 |
| 1.4757 | 0.2852 | 75 | 1.4410 |
| 1.4008 | 0.3802 | 100 | 1.4100 |
| 1.4116 | 0.4753 | 125 | 1.3857 |
| 1.357 | 0.5703 | 150 | 1.3630 |
| 1.3435 | 0.6654 | 175 | 1.3478 |
| 1.3332 | 0.7605 | 200 | 1.3353 |
| 1.3042 | 0.8555 | 225 | 1.3308 |
| 1.2993 | 0.9506 | 250 | 1.3228 |
| 1.3105 | 1.0456 | 275 | 1.3154 |
| 1.2782 | 1.1407 | 300 | 1.3094 |
| 1.3063 | 1.2357 | 325 | 1.3070 |
| 1.3003 | 1.3308 | 350 | 1.3005 |
| 1.2937 | 1.4259 | 375 | 1.2952 |
| 1.283 | 1.5209 | 400 | 1.2922 |
| 1.2692 | 1.6160 | 425 | 1.2887 |
| 1.2639 | 1.7110 | 450 | 1.2855 |
| 1.2546 | 1.8061 | 475 | 1.2822 |
| 1.2711 | 1.9011 | 500 | 1.2787 |
| 1.2492 | 1.9962 | 525 | 1.2770 |
Framework versions
- PEFT 0.14.0
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.1
- Downloads last month
- 13
Model tree for BeaverAI/MN-2407-DSK-QwQify-v0.1-12B
Base model
mistralai/Mistral-Nemo-Base-2407