VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks
arXiv 2025
*Indicates Equal Contribution
We welcome contributions to the leaderboard!
Contact: zhoubeitong.zbt@antgroup.com
The overview of VenusBench-GD benchmark. VenusBench-GD integrates basic and advanced grounding tasks to comprehensively evaluation the capabilities of existing GUI models. Basic tasks assess the ability to recognize local UI elements, while advanced tasks require holistic reasoning over the entire interface and its underlying application functionality, demanding a more complex and global understanding.
Dataset Composition
Benchmark Statistics
👁 benchmark statisticsThe dataset statistics of VenusBench-GD reveal a diverse and challenging distribution across key dimensions. a) The image resolutions span a wide range, with a significant proportion concentrated in common screen sizes. b) UI element sizes vary substantially relative to the image area, covering a broad spectrum from very small to large elements. c) Meanwhile, instruction lengths exhibit a rich distribution, peaking in mid-length queries but extending to longer, more complex descriptions.
Performance Comparison
The performance of representative models on advanced grounding tasks are significantly lower than on basic tasks, highlighting the increased difficulty and reasoning demands.
Humane Performance vs. state-of-the-art (SOTA) on grounding tasks. A significant performance gap persists, particularly in advanced grounding scenarios.
Experimental Results
👁 Experimental ResultsPerformance comparison on VenusBench-GD dataset categorized by the evaluation tasks.
Dataset Visualization
Examples of basic grounding tasks, illustrating both correct and incorrect matches between generated instructions and their corresponding annotated bounding boxes.
Examples of advanced grounding tasks. In the refusal grounding task, the red bounding box indicates the original UI element. After modification of the instruction, no matching element exists in the image.
BibTeX
@misc{zhou2025venusbenchgdcomprehensivemultiplatformgui,
title={VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks},
author={Beitong Zhou and Zhexiao Huang and Yuan Guo and Zhangxuan Gu and Tianyu Xia and Zichen Luo and Fei Tang and Dehan Kong and Yanyi Shang and Suling Ou and Zhenlin Guo and Changhua Meng and Shuheng Shen},
year={2025},
eprint={2512.16501},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.16501},
}
