👁 Image Submitted by 👁 Image Pinzhi Huang 52 Benchmarking Visual State Tracking in Multimodal Video Understanding 👁 nyu-visionx VISIONx @ NYU 33 1
👁 Image Submitted by 👁 Image Ellis Brown 3 PaintBench: Deterministic Evaluation of Precise Visual Editing 👁 nyu-visionx VISIONx @ NYU 4 3
Submitted by 👁 Image taesiri 31 Solaris: Building a Multiplayer Video World Model in Minecraft 👁 nyu-visionx VISIONx @ NYU 214 3
👁 Image Submitted by 👁 Image BoYang Zheng 55 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders 👁 nyu-visionx VISIONx @ NYU 251 2
👁 Image Submitted by 👁 Image Ellis Brown 6 SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding 👁 nyu-visionx VISIONx @ NYU 11 2
👁 Image Submitted by 👁 Image Jihan Yang 10 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts 👁 nyu-visionx VISIONx @ NYU 2
👁 Image Submitted by 👁 Image Peter Tong 171 Diffusion Transformers with Representation Autoencoders 👁 nyu-visionx VISIONx @ NYU 1.95k 6