VOOZH about

URL: https://huggingface.co/Salesforce/GPA-GUI-Detector

โ‡ฑ Salesforce/GPA-GUI-Detector ยท Hugging Face


GPA-GUI-Detector

A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the OmniParser ecosystem.

Model

The model weight file is model.pt. It is a YOLO model trained with the Ultralytics framework.

Installation

pip install ultralytics

Usage

Basic Inference

from ultralytics import YOLO

model = YOLO("model.pt")
results = model("screenshot.png")

Detection with Custom Parameters

from ultralytics import YOLO
from PIL import Image

# Load the model
model = YOLO("model.pt")

# Run inference with custom confidence and image size
results = model.predict(
 source="screenshot.png",
 conf=0.05, # confidence threshold
 imgsz=640, # input image size
 iou=0.7, # NMS IoU threshold
)

# Parse results
boxes = results[0].boxes.xyxy.cpu().numpy() # bounding boxes in [x1, y1, x2, y2]
scores = results[0].boxes.conf.cpu().numpy() # confidence scores

# Draw results on image
img = Image.open("screenshot.png")
for box, score in zip(boxes, scores):
 x1, y1, x2, y2 = box
 print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})")

# Or save the annotated image directly
results[0].save("result.png")

Integration with OmniParser

import sys
sys.path.append("/path/to/OmniParser")

from util.utils import get_yolo_model, predict_yolo
from PIL import Image

model = get_yolo_model("model.pt")
image = Image.open("screenshot.png")

boxes, confidences, phrases = predict_yolo(
 model=model,
 image=image,
 box_threshold=0.05,
 imgsz=640,
 scale_img=False,
 iou_threshold=0.7,
)

for i, (box, conf) in enumerate(zip(boxes, confidences)):
 print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}")

Example

Detection results on a sample screenshot (1920x1080) from the ScreenSpot-Pro benchmark (conf=0.05, iou=0.1, imgsz=1280).

Input Screenshot

๐Ÿ‘ Input Screenshot

OmniParser V2 GPA-GUI-Detector
๐Ÿ‘ OmniParser V2
๐Ÿ‘ GPA-GUI-Detector

License

This model is released under the MIT License.

Downloads last month
744

Collection including Salesforce/GPA-GUI-Detector