Magistrate 3.2 3B

Continued pretraining applied to meta-llama/Llama-3.2-3B using no synthetic legal data. ~250M tokens.

The model achieves the following results on the evaluation set:

Loss: 0.6802

Instruct version is available

👁 Built with Axolotl

Model description

This is a base model trained on US Supreme Court proceedings, US federal code and regulations.

Intended uses & limitations

This model is intended for research purposes. You are liable for all model outputs.

Training and evaluation data

The training data consists of US Supreme Court verdicts, federal regulations, laws and treaties.

Some other resources have been included from institutions like CLL to fill in the gaps in knowledge for industry jargon.

Training procedure

Spectrum top 35% fine tune. Thanks to the cognitive computations team for the work done on spectrum.

Methodology based on Cohere's paper: To Code, or Not To Code? Exploring Impact of Code in Pre-training

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 690
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
1.3589	0.0004	1	1.5640
0.9936	0.4984	1154	0.9440
0.8384	0.9968	2308	0.8392
0.8226	1.4963	3462	0.7802
0.6568	1.9949	4616	0.7059
0.5163	2.4923	5770	0.6886
0.492	2.9922	6924	0.6802

Framework versions

Transformers 4.45.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.20.0

Downloads last month: 10

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for macadeliccc/magistrate-3.2-3b-base

Base model

meta-llama/Llama-3.2-3B

Finetuned

(461)

this model

Finetunes

1 model

Quantizations

2 models

Datasets used to train macadeliccc/magistrate-3.2-3b-base

Paper for macadeliccc/magistrate-3.2-3b-base

Paper • 2408.10914 • Published Aug 20, 2024 • 45

URL: https://huggingface.co/macadeliccc/magistrate-3.2-3b-base

⇱ macadeliccc/magistrate-3.2-3b-base · Hugging Face