Paper • 2603.18856 • Published • 2
Motion-O: Motion-Aware Trajectory Reasoning for Video
Motion-O is a family of Qwen2.5-VL models fine-tuned for motion-aware trajectory reasoning in videos. This work is introduced in the paper Motion-o: Trajectory-Grounded Video Reasoning.
The models learn to produce structured <think>...</think> chains with <obj>, <box>, <t>, and <motion> tags that describe object motion over time, and to answer a final question about the video.
Links:
Available variants
All variants live in this repository as subfolders:
(root)–grpo_dense_t07_4737145/checkpoint-800/merged- Name: Motion-O (no visual grounding)
- Description: GRPO on STGR with motion-aware rewards; no explicit open-o3 visual grounding.
open-o3-mcot–open-o3_grpo_v3_074917638/checkpoint-600/merged- Name: Open-o3 + MCoT (with visual grounding)
- Description: Open-o3-Video style model with multi-chain-of-thought and explicit visual grounding.
open-o3-mcot-no-vg–open-o3_grpo_v2_4896760/checkpoint-1000/merged- Name: Open-o3 + MCoT (no visual grounding)
- Description: Same training recipe as above but without the additional visual-grounding objective.
How to load
from transformers import AutoModelForCausalLM, AutoProcessor
# 1) Motion-O (no visual grounding) – repo root
model = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
torch_dtype="auto",
)
processor = AutoProcessor.from_pretrained("bishoygaloaa/motion-o")
# 2) Open-o3 + MCoT (with visual grounding)
model_vg = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot",
torch_dtype="auto",
)
processor_vg = AutoProcessor.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot",
)
# 3) Open-o3 + MCoT (no visual grounding)
model_no_vg = AutoModelForCausalLM.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot-no-vg",
torch_dtype="auto",
)
processor_no_vg = AutoProcessor.from_pretrained(
"bishoygaloaa/motion-o",
subfolder="open-o3-mcot-no-vg",
)
Citation
If you use Motion-O in your work, please cite:
@article{galoaa2026motion,
title = {Motion-Aware Trajectory Reasoning for Video Understanding},
author = {Galoaa, Bishoy* and Moezzi, Shayda* and Bai, Xiangyu and Ostadabbas, Sarah},
journal = {arXiv preprint arXiv:2603.18856},
year = {2026},
url = {https://arxiv.org/abs/2603.18856}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for bishoygaloaa/motion-o
Base model
Qwen/Qwen2.5-VL-7B-Instruct