Dolphin 2.9.4 Llama 3.1 8b 🐬

This is the GGUF conversion, for use with llama.cpp, ollama, lmstudio etc.

Curated and trained by Eric Hartford and Cognitive Computations

👁 Discord
Discord: https://discord.gg/h3K4XGj2RH

👁 Image

Our appreciation for the sponsors of Dolphin 2.9.4:

Crusoe Cloud - provided excellent on-demand 8xL40S node

This model is based on Meta Llama 3.1 8b, and is governed by the Llama 3.1 license.

The base model has 128K context, and our finetuning used 8192 sequence length.

Dolphin 2.9.4 uses ChatML prompt template format.

example:

<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Dolphin-2.9.4 has a variety of instruction following, conversational, and coding skills. It also has agentic abilities and supports function calling. It is especially trained to obey the system prompt, and follow instructions in many languages.

Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.

👁 Built with Axolotl

workspace/axolotl/dolphin-2.9.4-llama3.1-8b

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 16
total_train_batch_size: 256
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
0.5837	1.0180	1161	0.5814
0.5525	2.0179	2322	0.5671
0.5514	2.9624	3420	0.5655

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.4.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Downloads last month: 2,030

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dphn/dolphin-2.9.4-llama3.1-8b-gguf

Base model

meta-llama/Llama-3.1-8B

Quantized

(326)

this model

URL: https://huggingface.co/dphn/dolphin-2.9.4-llama3.1-8b-gguf

⇱ dphn/dolphin-2.9.4-llama3.1-8b-gguf · Hugging Face

Dolphin 2.9.4 Llama 3.1 8b 🐬

workspace/axolotl/dolphin-2.9.4-llama3.1-8b

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dphn/dolphin-2.9.4-llama3.1-8b-gguf

Datasets used to train dphn/dolphin-2.9.4-llama3.1-8b-gguf