VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

^*, ^*, ^*, ^*, , , , , , , , ,

Venus Team, Ant Group
arXiv 2025
^*Indicates Equal Contribution

We welcome contributions to the leaderboard!

Contact: zhoubeitong.zbt@antgroup.com

Paper Code arXiv 🤗 Dataset Leaderboard

👁 VenusBench-GD benchmark

The overview of VenusBench-GD benchmark. VenusBench-GD integrates basic and advanced grounding tasks to comprehensively evaluation the capabilities of existing GUI models. Basic tasks assess the ability to recognize local UI elements, while advanced tasks require holistic reasoning over the entire interface and its underlying application functionality, demanding a more complex and global understanding.

Dataset Composition

3 Platforms

10 Domains

97+ Applications

6100+ Sample Pairs

👁 Domain Distribution Sunburst Chart

Domain distribution across the VenusBench dataset

Benchmark Statistics

👁 benchmark statistics

The dataset statistics of VenusBench-GD reveal a diverse and challenging distribution across key dimensions. a) The image resolutions span a wide range, with a significant proportion concentrated in common screen sizes. b) UI element sizes vary substantially relative to the image area, covering a broad spectrum from very small to large elements. c) Meanwhile, instruction lengths exhibit a rich distribution, peaking in mid-length queries but extending to longer, more complex descriptions.

Performance Comparison

👁 Radar Chart

The performance of representative models on advanced grounding tasks are significantly lower than on basic tasks, highlighting the increased difficulty and reasoning demands.

👁 Human Performance

Humane Performance vs. state-of-the-art (SOTA) on grounding tasks. A significant performance gap persists, particularly in advanced grounding scenarios.

Experimental Results

👁 Experimental Results

Performance comparison on VenusBench-GD dataset categorized by the evaluation tasks.

Dataset Visualization

👁 Baisc Grounding Visualization

Examples of basic grounding tasks, illustrating both correct and incorrect matches between generated instructions and their corresponding annotated bounding boxes.

👁 Advanced Grounding Visualization

Examples of advanced grounding tasks. In the refusal grounding task, the red bounding box indicates the original UI element. After modification of the instruction, no matching element exists in the image.

BibTeX

@misc{zhou2025venusbenchgdcomprehensivemultiplatformgui,
 title={VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks}, 
 author={Beitong Zhou and Zhexiao Huang and Yuan Guo and Zhangxuan Gu and Tianyu Xia and Zichen Luo and Fei Tang and Dehan Kong and Yanyi Shang and Suling Ou and Zhenlin Guo and Changhua Meng and Shuheng Shen},
 year={2025},
 eprint={2512.16501},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2512.16501}, 
}

URL: https://ui-venus.github.io/VenusBench-GD/