Olmo3-IVON-SFT-7B

📦 Code: insait-institute/c3po

Olmo-3 7B supervised-fine-tuned with the variational optimizer IVON, from the paper "Parameter Exploration for RLVR via Variational Learning".

This is a warm-start checkpoint: SFT'ing with IVON yields not just point weights but an approximate Gaussian posterior over them (a mean and a diagonal Hessian/precision estimate). That posterior is the learned prior used to seed the 3PO RLVR runs (B3PO / M3PO / C3PO), where weight perturbations sampled from it drive parameter-space exploration.

Training


Foundation model	`allenai/Olmo-3-1025-7B`
Stage	Warm-start SFT
Data	Llama-Nemotron Post-Training Dataset (SFT subset)
Optimizer	IVON, lr `50.0`, ESS (λ) `1e10`
Hardware	8× NVIDIA H200 (144 GB)

Usage

Loads as a standard causal LM:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("BayesRL/Olmo3-IVON-SFT-7B")
tok = AutoTokenizer.from_pretrained("BayesRL/Olmo3-IVON-SFT-7B")

To use it as the warm-start prior for 3PO RLVR, load the IVON optimizer state via IVON_INIT_METHOD=trained in the companion code's run_rl.sh.

Citation

@misc{venkatkrishna2026parameter,
 title={Parameter Exploration for RLVR via Variational Learning},
 author={Vatsal Venkatkrishna and Nico Daheim and Iryna Gurevych},
 year={2026},
}

Downloads last month: 1,793

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for BayesRL/Olmo3-IVON-SFT-7B

Base model

allenai/Olmo-3-1025-7B

Finetuned

(76)

this model

Finetunes

6 models

Dataset used to train BayesRL/Olmo3-IVON-SFT-7B

Collection including BayesRL/Olmo3-IVON-SFT-7B

A collection of three models trained on the Nemotron Post Training Dataset for reasoning tasks with IVON • 3 items • Updated 6 days ago

Paper for BayesRL/Olmo3-IVON-SFT-7B

Paper • 2402.17641 • Published Feb 27, 2024

URL: https://huggingface.co/BayesRL/Olmo3-IVON-SFT-7B

⇱ BayesRL/Olmo3-IVON-SFT-7B · Hugging Face