VOOZH about

URL: https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data

โ‡ฑ Eyeline-Labs/Vista4D-Eval-Data ยท Datasets at Hugging Face


Dataset Viewer

Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) โ€“ Evaluation Dataset

๐Ÿ‘ Project Page
๐Ÿ‘ Paper
๐Ÿ‘ Hugging Face
๐Ÿ‘ Hugging Face

Kuan Heng Lin1,3โˆ—, Zhizheng Liu1,4โˆ—, Pablo Salamanca1,2, Yash Kant1,2, Ryan Burgert1,2,5โˆ—, Yuancheng Xu1,2, Koichi Namekata1,2,6โˆ—, Yiwei Zhao2, Bolei Zhou4, Micah Goldblum3, Paul Debevec1,2, Ning Yu1,2
1Eyeline Labs, 2Netflix, 3Columbia University, 4UCLA, 5Stony Brook University, 6University of Oxford

โˆ—Work done during an internship at Eyeline Labs

Vista4D is a video reshooting framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.

This is the Hugging Face repository containing our evaluation dataset. We provide 110 video-camera pairs to evaluate Vista4D. We select 13 videos from DAVIS and 38 videos from Pexels. We use Pi3 for 4D reconstruction and Grounded SAM 2 to do dynamic pixel segmentation. Then, for each video, we hand-design two to three target cameras for each video using our camera UI.

To download the dataset, from the root directory of the project, run

huggingface-cli download Eyeline-Labs/Vista4D-Eval-Data --repo-type dataset --local-dir eval_data

to download the Vista4D evaluation dataset into ./eval_data/ and then run

tar -xvf eval_data/eval_data.tar -C eval_data/

to extract the contents. It should have the following structure:

eval_data/
 metadata.csv
 recon_and_seg/ # 4D reconstruction and dynamic mask segmentation
 avocado-slice/ # There should be 51 total videos
 cameras.npz # Source intrinsics and extrinsics
 video.mp4
 depths/
 00000.exr
 ...
 dynamic_mask/
 00000.png
 ...
 sky_mask/ # Sky segmentation (to set them to a large depth)
 00000.png
 ...
 [video_name]/
 ...
 ...
 cameras/
 avocado-slice/ # Two to three target cameras per video
 close-crane-above.npz
 left-front-zoom.npz
 [video_name]/
 [camera_name].npz
 ...
 ...

metadata.csv contains the following information:

  • name: Name of video-camera pair, in the format [video]_[camera]
  • video: Name of source video, the 4D reconstruction and segmentation can be found in eval_data/recon_and_seg/[video]/
  • camera: Name of camera, corresponds to a video, can be found in eval_data/cameras/[video]/[camera].npz
  • seed: Randomly-generated fixed seed for evaluation
  • prompt: Prompt for the video-camera pair, usually just the prompt of the source video
  • dynamic: Dynamic keywords used to obtain the segmentation map
  • do_sky_seg: Whether the video contains sky (and thus we need to segment it separately)
  • source: Source of the video, davis or pexels
  • video_id: For videos from pexels only, original ID of the video on Pexels, full link is https://www.pexels.com/video/[video_id]

Instructions on how to use this dataset, model weights, more results, and paper can be found on our project page and GitHub repository.

Downloads last month
77

Paper for Eyeline-Labs/Vista4D-Eval-Data