A model made to continue off my previous work on Magnum 4B, A small model made for creative writing / General assistant tasks, finetuned ontop of IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml, this model is made to be more coherent and generally be better then the 4B at both writing and assistant tasks.

EXL2 quants of Holland 4B, Original weights can be found here

Prompting

Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Support

No longer needed - LCPP has merged support, just update

To run inference on this model, you'll need to use Aphrodite, vLLM or EXL 2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama 3.1 rope_freqs issue with custom head dimensions.

However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until this PR is merged, the context will be limited to 8 k tokens.

To create a working GGUF file, make the following adjustments:

Remove the "rope_scaling": {} entry from config.json
Change "max_position_embeddings" to 8192 in config.json

These modifications should allow you to use the model with llama. Cpp, albeit with the mentioned context limitation.

Axolotl config

Credits

Training

The training was done for 2 epochs. We used 2 x RTX 6000s GPUs graciously provided by Kubernetes_Bad for the full-parameter fine-tuning of the model.

👁 Built with Axolotl

Safety

...

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

URL: https://huggingface.co/Delta-Vector/Holland-4B-EXL2

⇱ Delta-Vector/Holland-4B-EXL2 · Hugging Face