VOOZH about

URL: https://huggingface.co/ByteDance-Seed/VINCIE-3B

โ‡ฑ ByteDance-Seed/VINCIE-3B ยท Hugging Face


Configuration Parsing Warning:Invalid JSON for config file config.json

VINCIE: Unlocking In-context Image Editing from Video

Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang

๐Ÿ‘ VINCIE Website
๐Ÿ‘ VINCIE Paper on ArXiv
๐Ÿ‘ Github
๐Ÿ‘ VINCIE Models
๐Ÿ‘ VINCIE Space

In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable approach to annotate videos as interleaved multimodal sequences. To effectively learn from this data, we design a block-causal diffusion transformer trained on three proxy tasks: next-image prediction, current segmentation prediction, and next-segmentation prediction. Additionally, we propose a novel multi-turn image editing benchmark to advance research in this area. Extensive experiments demonstrate that our model exhibits strong in-context image editing capabilities and achieves state-of-the-art results on two multi-turn image editing benchmarks. Despite being trained exclusively on videos, our model also shows promising abilities in multi-concept composition, story generation, and chain-of-editing applications.

โœ๏ธ Citation

@article{qu2025vincie,
 title={VINCIE: Unlocking In-context Image Editing from Video},
 author={Qu, Leigang and Cheng, Feng and Yang, Ziyan and Zhao, Qi and Lin, Shanchuan and Shi, Yichun and Li, Yicong and Wang, Wenjie and Chua, Tat-Seng and Jiang, Lu},
 journal={arXiv preprint arXiv:2506.10941},
 year={2025}
}

๐Ÿ“œ License

VINCIE is licensed under the Apache 2.0.

Downloads last month
27

Spaces using ByteDance-Seed/VINCIE-3B 2

Collection including ByteDance-Seed/VINCIE-3B

Paper for ByteDance-Seed/VINCIE-3B