Paper โข 2501.08303 โข Published
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers (CVPR 2025)
This model is described in the paper Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers.
Project Page: https://futurist-cvpr2025.github.io
FUTURIST employs a multimodal visual sequence transformer to directly predict multiple future semantic modalities. We focus on two key modalities: semantic segmentation and depth estimation.
- Key innovation 1: We introduce a VAE-free hierarchical tokenization process integrated directly into our transformer. This simplifies training, reduces computational overhead, and enables true end-to-end optimization
- Key innovation 2: Our model features an efficient cross-modality fusion mechanism that improves predictions by learning synergies between different modalities (segmentation + depth)
- Key innovation 3: We developed a novel multimodal masked visual modeling objective specifically designed for future prediction tasks
We achieve state-of-the-art performance in future semantic segmentation on Cityscapes, with strong improvements in both short-term (0.18s) and mid-term (0.54s) predictions
Code
https://github.com/Sta8is/FUTURIST
Demo:
We provide 2 quick demos.
Citation:
If you found Futurist useful in your research, please consider starring โญ us on GitHub and citing ๐ us in your research!
@InProceedings{Karypidis_2025_CVPR,
author = {Karypidis, Efstathios and Kakogeorgiou, Ioannis and Gidaris, Spyros and Komodakis, Nikos},
title = {Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {3793-3803}
@article{karypidis2025advancingsemanticfutureprediction,
title={Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers},
author={Efstathios Karypidis and Ioannis Kakogeorgiou and Spyros Gidaris and Nikos Komodakis},
year={2025},
journal={arXiv:2501.08303}
url={https://arxiv.org/abs/2501.08303},
}
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
