Show-o-512x512-RecA

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

This repository hosts the model weights for Show-o-512x512-RecA, presented in the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit the RecA GitHub repository and the Project Page. You can also refer to Show-o's original GitHub repository for the base model.

🧠 Method

👁 Paper
👁 ArXiv
👁 Github
👁 Hugging Face Collection
👁 HF Demo
👁 Project Page

📊 Benchmarks

Model	GenEval ↑	DPGBench ↑	WISE ↑
Show-o-512x512	0.67	82.21	0.40
Show-o-512x512-RecA	0.72	84.94	0.40

License

Show-o-512x512-RecA is licensed under the Apache 2.0 license.

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@article{xie2025reconstruction,
 title={Reconstruction Alignment Improves Unified Multimodal Models},
 author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
 journal={arXiv preprint arXiv:2509.07295},
 year={2025}
}