BeaverAI/MN-2407-DSK-QwQify-v0.1-12B

GGUF

Test model to try to give an existing model QwQ's thoughts. For this first version it is ontop of PocketDoc/Dans-SakuraKaze-V1.0.0-12b (an rp/adventure/co-writing model), which was trained ontop of PocketDoc/Dans-PersonalityEngine-V1.1.0-12b (a jack of all trades instruct model), which was trained ontop of mistralai/Mistral-Nemo-Base-2407.

The prompt formatting and usage should be the same as with QwQ; Use ChatML, and remove the thinking from previous turns. If thoughts arent being generated automatically, add <think>\n to the start of the assistant turn.

It should follow previous model turns formatting. On first turns of the conversation you may need to regen a few times, and maybe edit the model responses for the first few turns to get it to your liking.

You may want to disable inserting {{char}}: prefix for the character, and instead add something like Only speak as "{{char}}" in conversation with "{{user}}". Output your final response with a "{{char}}:" prefix. to the end of you system prompt.

👁 image/png

👁 Built with Axolotl

MN-2407-DSK-QwQify-v0.1-12B-LoRA-WS

This model is a fine-tuned version of PocketDoc/Dans-SakuraKaze-V1.0.0-12b on the PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled and the PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled datasets. It achieves the following results on the evaluation set:

Loss: 1.2770

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss
2.134	0.0038	1	2.0025
1.6185	0.0951	25	1.5748
1.5187	0.1901	50	1.4871
1.4757	0.2852	75	1.4410
1.4008	0.3802	100	1.4100
1.4116	0.4753	125	1.3857
1.357	0.5703	150	1.3630
1.3435	0.6654	175	1.3478
1.3332	0.7605	200	1.3353
1.3042	0.8555	225	1.3308
1.2993	0.9506	250	1.3228
1.3105	1.0456	275	1.3154
1.2782	1.1407	300	1.3094
1.3063	1.2357	325	1.3070
1.3003	1.3308	350	1.3005
1.2937	1.4259	375	1.2952
1.283	1.5209	400	1.2922
1.2692	1.6160	425	1.2887
1.2639	1.7110	450	1.2855
1.2546	1.8061	475	1.2822
1.2711	1.9011	500	1.2787
1.2492	1.9962	525	1.2770

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.1

Downloads last month: 13

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for BeaverAI/MN-2407-DSK-QwQify-v0.1-12B

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

PocketDoc/Dans-PersonalityEngine-V1.1.0-12b

Finetuned

PocketDoc/Dans-SakuraKaze-V1.0.0-12b

Finetuned

(3)

this model

Merges

4 models

Quantizations

2 models

URL: https://huggingface.co/BeaverAI/MN-2407-DSK-QwQify-v0.1-12B