This is the checkpoints and dataset for: From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning • 3 items • Updated
This is the checkpoint of the From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
- Downloads last month
- 17
Safetensors
Model size
4B params
Tensor type
BF16
·
