Does a retrieved past step (screenshot+action) help a GUI agent pick the next action? Cold Qwen3.5-4B, 3-arm A/B. v1 single-seed. • 3 items • Updated
memrag-basecur — Trajectory-Memory RAG (GUI agent)
Cold-start SFT from Qwen3.5-4B for GUI next-action prediction. This checkpoint = the current screenshot only (baseline) arm of a 3-arm A/B.
Action accuracy (n=498 test, AgentNetBench score_pair): 0.366
Status: v1, single-seed (positive; 3-seed confirmation pending). See the collection for the other arms.
Load
from transformers import AutoProcessor
from qwen_cua.modeling_qwen35_vl_latent import Qwen35VLLatentForConditionalGeneration as M
proc = AutoProcessor.from_pretrained("hyunseoki/memrag-basecur", max_pixels=1_000_000)
model = M.from_pretrained("hyunseoki/memrag-basecur", torch_dtype="bfloat16", attn_implementation="flash_attention_2")
Plain Qwen3.5-VL arch (wm.enabled=false) — also loadable with the standard class.
- Downloads last month
- 16
Safetensors
Model size
5B params
Tensor type
BF16
·
