Paper • 2510.08377 • Published • 81
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cong Wei*,1,2 Quande Liu†,2 Zixuan Ye2 Qiulin Wang2 Xintao Wang2
Pengfei Wan2 Kun Gai2 Wenhu Chen†,1
1University of Waterloo
2Kling Team, Kuaishou Technology
*Work done during an internship at Kling Team, Kuaishou Technology
†Corresponding author
👁 Image
👁 Image
👁 Image
👁 Image
🔔News
- [2026-01-07]: Released Code and Model.
- [2025-10-09]: Released Arxiv Preprint and the Project Page
How to use
- Please refer to 🔗 GitHub for usage.
Acknowledgement
- HunyuanVideo: the base video generation model used in this work. Thanks to the authors for their excellent contribution.
- Qwen2.5-VL: the base vlm model used in this work. Thanks to the authors for their excellent contribution.
- MetaQueries: we adopt their query implementation. Thanks to the authors for their excellent contribution.
🌟 Citation
If you find UniVideo useful for your research and applications, please cite using this BibTeX:
@article{wei2025univideo,
title={Univideo: Unified understanding, generation, and editing for videos},
author={Wei, Cong and Liu, Quande and Ye, Zixuan and Wang, Qiulin and Wang, Xintao and Wan, Pengfei and Gai, Kun and Chen, Wenhu},
journal={arXiv preprint arXiv:2510.08377},
year={2025}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support
Model tree for KlingTeam/UniVideo
Base model
Qwen/Qwen2.5-VL-7B-Instruct