Voozh

Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen^†, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue^‡, Lu Jiang^‡

^† Project Lead ^‡ Corresponding Authors

👁 Project Page
👁 Tar Paper on arXiv
👁 Huggingface Model
👁 Huggingface Space
👁 Huggingface Space
👁 Image

Citation

@article{han2025tar,
 title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
 author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
 journal={arXiv preprint arXiv:2506.18898},
 year={2025},
}

License

This project is licensed under the Apache 2.0 License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ByteDance-Seed/Tar-TA-Tok

Base model

google/siglip2-so400m-patch14-384

Finetuned

(27)

this model

Spaces using ByteDance-Seed/Tar-TA-Tok 3

Collection including ByteDance-Seed/Tar-TA-Tok

[NeurIPS 2025] Unifying Visual Understanding and Generation via Text-Aligned Representations • 5 items • Updated Sep 20, 2025 • 18

Paper for ByteDance-Seed/Tar-TA-Tok

Paper • 2506.18898 • Published Jun 23, 2025 • 35

URL: https://huggingface.co/ByteDance-Seed/Tar-TA-Tok

⇱ ByteDance-Seed/Tar-TA-Tok · Hugging Face

Unifying Visual Understanding and Generation via Text-Aligned Representations

Citation

License

Model tree for ByteDance-Seed/Tar-TA-Tok

Spaces using ByteDance-Seed/Tar-TA-Tok 3

Collection including ByteDance-Seed/Tar-TA-Tok

Paper for ByteDance-Seed/Tar-TA-Tok