VOOZH about

URL: https://huggingface.co/macadeliccc/magistrate-3.2-3b-base

⇱ macadeliccc/magistrate-3.2-3b-base · Hugging Face


Magistrate 3.2 3B

Continued pretraining applied to meta-llama/Llama-3.2-3B using no synthetic legal data. ~250M tokens.

The model achieves the following results on the evaluation set:

  • Loss: 0.6802

Instruct version is available

👁 Built with Axolotl


Model description

This is a base model trained on US Supreme Court proceedings, US federal code and regulations.

Intended uses & limitations

This model is intended for research purposes. You are liable for all model outputs.

Training and evaluation data

The training data consists of US Supreme Court verdicts, federal regulations, laws and treaties.

Some other resources have been included from institutions like CLL to fill in the gaps in knowledge for industry jargon.

Training procedure

Spectrum top 35% fine tune. Thanks to the cognitive computations team for the work done on spectrum.

Methodology based on Cohere's paper: To Code, or Not To Code? Exploring Impact of Code in Pre-training

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 690
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.3589 0.0004 1 1.5640
0.9936 0.4984 1154 0.9440
0.8384 0.9968 2308 0.8392
0.8226 1.4963 3462 0.7802
0.6568 1.9949 4616 0.7059
0.5163 2.4923 5770 0.6886
0.492 2.9922 6924 0.6802

Framework versions

  • Transformers 4.45.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.0
Downloads last month
10
Safetensors
Model size
3B params
Tensor type
BF16
·

Model tree for macadeliccc/magistrate-3.2-3b-base

Finetuned
(461)
this model
Finetunes
1 model
Quantizations
2 models

Datasets used to train macadeliccc/magistrate-3.2-3b-base

Paper for macadeliccc/magistrate-3.2-3b-base