Show-o-512x512-RecA
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Show-o-512x512-RecA, presented in the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit the RecA GitHub repository and the Project Page. You can also refer to Show-o's original GitHub repository for the base model.
๐ง Method
๐ Paper
๐ ArXiv
๐ Github
๐ Hugging Face Collection
๐ HF Demo
๐ Project Page
๐ Benchmarks
| Model | GenEval โ | DPGBench โ | WISE โ |
|---|---|---|---|
| Show-o-512x512 | 0.67 | 82.21 | 0.40 |
| Show-o-512x512-RecA | 0.72 | 84.94 | 0.40 |
License
Show-o-512x512-RecA is licensed under the Apache 2.0 license.
โ๏ธ Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star โญ and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
- Downloads last month
- 8
Model tree for sanaka87/Show-o-512x512-RecA
Base model
showlab/show-o-w-clip-vit-512x512