Configuration Parsing Warning:Invalid JSON for config file config.json

VINCIE: Unlocking In-context Image Editing from Video

Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang

👁 VINCIE Website
👁 VINCIE Paper on ArXiv
👁 Github
👁 VINCIE Models
👁 VINCIE Space

In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable approach to annotate videos as interleaved multimodal sequences. To effectively learn from this data, we design a block-causal diffusion transformer trained on three proxy tasks: next-image prediction, current segmentation prediction, and next-segmentation prediction. Additionally, we propose a novel multi-turn image editing benchmark to advance research in this area. Extensive experiments demonstrate that our model exhibits strong in-context image editing capabilities and achieves state-of-the-art results on two multi-turn image editing benchmarks. Despite being trained exclusively on videos, our model also shows promising abilities in multi-concept composition, story generation, and chain-of-editing applications.

✍️ Citation

@article{qu2025vincie,
 title={VINCIE: Unlocking In-context Image Editing from Video},
 author={Qu, Leigang and Cheng, Feng and Yang, Ziyan and Zhao, Qi and Lin, Shanchuan and Shi, Yichun and Li, Yicong and Wang, Wenjie and Chua, Tat-Seng and Jiang, Lu},
 journal={arXiv preprint arXiv:2506.10941},
 year={2025}
}

📜 License

VINCIE is licensed under the Apache 2.0.

Downloads last month: 27

Spaces using ByteDance-Seed/VINCIE-3B 2

Collection including ByteDance-Seed/VINCIE-3B

A diffusion transformer model for in-context image generation and editing • 6 items • Updated Mar 19 • 13

Paper for ByteDance-Seed/VINCIE-3B

Paper • 2506.10941 • Published Jun 12, 2025 • 5

URL: https://huggingface.co/ByteDance-Seed/VINCIE-3B

⇱ ByteDance-Seed/VINCIE-3B · Hugging Face

VINCIE: Unlocking In-context Image Editing from Video

✍️ Citation

📜 License

Spaces using ByteDance-Seed/VINCIE-3B 2

Collection including ByteDance-Seed/VINCIE-3B

Paper for ByteDance-Seed/VINCIE-3B