BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Introduction

We introduce BFS-Prover-V2, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through:

Training-time scaling: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training
Inference-time scaling: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time

BFS-Prover-V2 achieves 95.08% and 41.4% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers.

This repo contains the BFS-Prover-V2-32B model, with the following features:

Base Model: Qwen2.5-32B
Training Approach: Multi-stage expert iteration with best-first tree search
Training Data Sources:
- Mathlib (via LeanDojo)
- Lean-Github repositories
- Autoformalized NuminaMath datasets
- Goedel-Pset

Benchmark Performance

Model	miniF2F-test	miniF2F-valid	ProofNet-test
BFS-Prover-V2-7B	82.4%	-	-
👉 BFS-Prover-V2-32B	86.1%	85.5%	41.4%
👉 BFS-Prover-V2-32B w/ Planner	95.08%	95.5%	-

Usage

The model expects input in the format "{state}:::" where {state} is a Lean4 tactic state.
::: serves as a special indicator to signal the model to generate a tactic for the given state.
The model will echo back the input state followed by the generated tactic.

# Example code for loading and using the tactic generator model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")
tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/BFS-Prover-V2-32B")

# imo_1964_p2 from miniF2F
state = """a b c : ℝ

 h₀ : 0 < a ∧ 0 < b ∧ 0 < c

 h₁ : c < a + b

 h₂ : b < a + c

 h₃ : a < b + c

 ⊢ a ^ 2 * (b + c - a) + b ^ 2 * (c + a - b) + c ^ 2 * (a + b - c) ≤ 3 * a * b * c"""

# Tactic generation
sep = ":::"
prompt = state + sep
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1]
print(tactic)

# Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]"

Citation

@article{xin2025scaling,
 title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers},
 author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia},
 journal={arXiv preprint arXiv:2509.06493},
 year={2025}
}

License

This project is licensed under the Apache License 2.0.

Contact

For questions and feedback about the tactic generator model, please contact:

Ran Xin (ran.xin@bytedance.com)
Zeyu Zheng (zeyuzhen@andrew.cmu.edu)

Downloads last month: 82

Safetensors

Model size

33B params

Tensor type

F32

Model tree for ByteDance-Seed/BFS-Prover-V2-32B

Base model

Qwen/Qwen2.5-32B

Finetuned

(125)

this model

Quantizations

4 models

Collection including ByteDance-Seed/BFS-Prover-V2-32B

LLM Step-Provers in Lean4 • 5 items • Updated Oct 7, 2025 • 8

Paper for ByteDance-Seed/BFS-Prover-V2-32B

Paper • 2509.06493 • Published Sep 8, 2025 • 13

URL: https://huggingface.co/ByteDance-Seed/BFS-Prover-V2-32B

⇱ ByteDance-Seed/BFS-Prover-V2-32B · Hugging Face