Dataset Viewer

Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Evaluation Dataset

👁 Project Page
👁 Paper
👁 Hugging Face
👁 Hugging Face

Kuan Heng Lin^1,3∗, Zhizheng Liu^1,4∗, Pablo Salamanca^1,2, Yash Kant^1,2, Ryan Burgert^1,2,5∗, Yuancheng Xu^1,2, Koichi Namekata^1,2,6∗, Yiwei Zhao², Bolei Zhou⁴, Micah Goldblum³, Paul Debevec^1,2, Ning Yu^1,2
¹Eyeline Labs, ²Netflix, ³Columbia University, ⁴UCLA, ⁵Stony Brook University, ⁶University of Oxford

^∗Work done during an internship at Eyeline Labs

Vista4D is a video reshooting framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.

This is the Hugging Face repository containing our evaluation dataset. We provide 110 video-camera pairs to evaluate Vista4D. We select 13 videos from DAVIS and 38 videos from Pexels. We use Pi3 for 4D reconstruction and Grounded SAM 2 to do dynamic pixel segmentation. Then, for each video, we hand-design two to three target cameras for each video using our camera UI.

To download the dataset, from the root directory of the project, run

huggingface-cli download Eyeline-Labs/Vista4D-Eval-Data --repo-type dataset --local-dir eval_data

to download the Vista4D evaluation dataset into ./eval_data/ and then run

tar -xvf eval_data/eval_data.tar -C eval_data/

to extract the contents. It should have the following structure:

eval_data/
 metadata.csv
 recon_and_seg/ # 4D reconstruction and dynamic mask segmentation
 avocado-slice/ # There should be 51 total videos
 cameras.npz # Source intrinsics and extrinsics
 video.mp4
 depths/
 00000.exr
 ...
 dynamic_mask/
 00000.png
 ...
 sky_mask/ # Sky segmentation (to set them to a large depth)
 00000.png
 ...
 [video_name]/
 ...
 ...
 cameras/
 avocado-slice/ # Two to three target cameras per video
 close-crane-above.npz
 left-front-zoom.npz
 [video_name]/
 [camera_name].npz
 ...
 ...

metadata.csv contains the following information:

name: Name of video-camera pair, in the format [video]_[camera]
video: Name of source video, the 4D reconstruction and segmentation can be found in eval_data/recon_and_seg/[video]/
camera: Name of camera, corresponds to a video, can be found in eval_data/cameras/[video]/[camera].npz
seed: Randomly-generated fixed seed for evaluation
prompt: Prompt for the video-camera pair, usually just the prompt of the source video
dynamic: Dynamic keywords used to obtain the segmentation map
do_sky_seg: Whether the video contains sky (and thus we need to segment it separately)
source: Source of the video, davis or pexels
video_id: For videos from pexels only, original ID of the video on Pexels, full link is https://www.pexels.com/video/[video_id]

Instructions on how to use this dataset, model weights, more results, and paper can be found on our project page and GitHub repository.

Downloads last month: 77

Paper for Eyeline-Labs/Vista4D-Eval-Data

Paper • 2604.21915 • Published Apr 23 • 12

URL: https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data

⇱ Eyeline-Labs/Vista4D-Eval-Data · Datasets at Hugging Face

Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Evaluation Dataset

Paper for Eyeline-Labs/Vista4D-Eval-Data