Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! โข 8 items โข Updated โข 14
Harmon-0.5B-RecA
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Harmon-0.5B-RecA. For installation, usage instructions, and further documentation, please visit Harmon's original GitHub repository.
๐ง Method
๐ Paper
๐ ArXiv
๐ Github
๐ Hugging Face Collection
๐ HF Demo
๐ Project Page
๐ Benchmarks
| Model | GenEval โ | DPGBench โ | WISE โ |
|---|---|---|---|
| Harmon-0.5B | 0.68 | 80.12 | 0.33 |
| Harmon-0.5B-RecA | 0.79 | 84.67 | 0.40 |
License
Harmon-0.5B-RecA is licensed under the Apache 2.0 license.
โ๏ธ Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star โญ and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
- Downloads last month
- 15
Model tree for sanaka87/Harmon-0.5B-RecA
Base model
wusize/Harmon-0_5B