Voozh

AI & ML interests

None defined yet.

Recent Activity

👁 Image

xcpan authored a paper 10 days ago

RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space

👁 Image

sihyun-yu authored a paper 12 days ago

Video Probabilistic Diffusion Models in Projected Latent Space

👁 Image

sihyun-yu authored a paper 12 days ago

Controllable Human Image Generation with Personalized Multi-Garments

View all activity

Papers

👁 Image

Benchmarking Visual State Tracking in Multimodal Video Understanding

👁 Image

PaintBench: Deterministic Evaluation of Precise Visual Editing

View all Papers

👁 Image

Submitted by

👁 Image

Pinzhi Huang

Benchmarking Visual State Tracking in Multimodal Video Understanding

👁 nyu-visionx
VISIONx @ NYU

33 1

👁 Image

Submitted by

👁 Image

Ellis Brown

PaintBench: Deterministic Evaluation of Precise Visual Editing

👁 nyu-visionx
VISIONx @ NYU

4 3

Submitted by

👁 Image

taesiri

Solaris: Building a Multiplayer Video World Model in Minecraft

👁 nyu-visionx
VISIONx @ NYU

214 3

👁 Image

Submitted by

👁 Image

BoYang Zheng

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

👁 nyu-visionx
VISIONx @ NYU

251 2

👁 Image

Submitted by

👁 Image

Ellis Brown

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

👁 nyu-visionx
VISIONx @ NYU

11 2

👁 Image

Submitted by

👁 Image

Jihan Yang

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

👁 nyu-visionx
VISIONx @ NYU

👁 Image

Submitted by

👁 Image

Peter Tong

171

Diffusion Transformers with Representation Autoencoders

👁 nyu-visionx
VISIONx @ NYU

1.95k 6

URL: https://huggingface.co/nyu-visionx/papers

⇱ nyu-visionx (VISIONx @ NYU)

AI & ML interests

Recent Activity

Papers

Benchmarking Visual State Tracking in Multimodal Video Understanding

PaintBench: Deterministic Evaluation of Precise Visual Editing

Solaris: Building a Multiplayer Video World Model in Minecraft

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Diffusion Transformers with Representation Autoencoders