Collection for "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders" • 9 items • Updated • 4
Dataset Viewer
Scale RAE Data
Project Page | Paper | Code
This repository contains data associated with the paper "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders".
The dataset is used for training and evaluating Scale-RAE, a framework that investigates scaling Representation Autoencoders (RAEs) for large-scale, freeform text-to-image (T2I) generation. It includes data used for scaling RAE decoders beyond ImageNet, featuring web, synthetic, and text-rendering data, as well as high-quality instruction datasets for fine-tuning.
Citation
If you find this work useful, please cite:
@article{scale-rae-2026,
title={Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders},
author={Shengbang Tong and Boyang Zheng and Ziteng Wang and Bingda Tang and Nanye Ma and Ellis Brown and Jihan Yang and Rob Fergus and Yann LeCun and Saining Xie},
journal={arXiv preprint arXiv:2601.16208},
year={2026}
}
- Downloads last month
- 2,218
