Dataset Viewer

The dataset viewer is not available because its heuristics could not detect any supported data files. You can try uploading some data files, or configuring the data files location manually.

YAML Metadata Warning:The task_categories "video-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation

CVPR 2026: Arxiv | Project Page

Scene-Decoupled Video Dataset

TL;DR: The Scene-Decoupled Video Dataset, introduced in CineScene, is a large-scale synthetic dataset for video generation with decoupled scene, which encompasses diverse scenes, subjects, and camera movements. This dataset contains camera trajectories, equirectangular panorama (scene image), and videos with/without dynamic subject. The data is organized into "With Human" (whuman) and "Without Human" (wohuman) categories, while panoramas are scene-decoupled and shared across both.

1. Directory Tree

.
├── camera/ # Camera trajectories and metadata
│ ├── whuman/ # Sequences containing human characters
│ │ └── <scene_id>/ # e.g., scene1_3x3_loc1_scene_AncientTempleEnv/
│ │ └── <scene_id>_cam.json # Camera parameters
│ └── wohuman/ # Sequences with environment only
│ └── <scene_id>/
│ └── <scene_id>_cam.json
│
├── panorama/ # Scene-decoupled environment maps
│ └── <scene_id>/ # Shared between whuman and wohuman
│ └── <scene_id>_pano.jpeg # 360° Equirectangular panoramic image
│
└── video/ # Rendered video sequences (MP4)
 ├── whuman/ # Videos with human characters
 │ └── <scene_id>/
 │ ├── <scene_id>_01_24mm.mp4 # Sub-sequences (01, 02, etc.)
 │ ├── <scene_id>_02_24mm.mp4
 │ └── ...
 └── wohuman/ # Videos without human characters
 └── <scene_id>/
 ├── <scene_id>_01_24mm.mp4
 ├── ...

2. Dataset Statistics

Total Scale: 46,816 videos.
Scenes: 3,400 scenes (comprising both whuman and wohuman scenes) across 35 high-quality 3D environments.
Trajectories: 46,816 camera paths (7 distinct camera trajectories per scene).
Panorama: 360° Equirectangular images for every scene, providing a complete background reference for scene conditioning.

Property	Value
Video Resolution	672 x 384
Frame Count	81 frames per video
Frame Rate	15 FPS
View Change Range	Up to 75°
Decoupled Scene	360° Equirectangular (Panorama)
Panorama Resolution	2048 x 1024

3. Dataset Construction

We follow the asset collection pipeline established by RecamMaster, but introduce three significant enhancements to support more complex generative tasks:

Decoupled Scenes: We provide static 360° panoramic images (Equirectangular) for every scene. This allows for explicit background conditioning and facilitates novel view synthesis from any angle.
Extended Camera Range: Our dataset covers significantly larger view changes (approx. 75°) compared to the 5–60° range provided in previous datasets.
Paired Subject/Background Data: Every scene includes both "with-subject" (whuman) and "background-only" (wohuman) video sequences. This paired data is ideal for training models on subject-background decoupling, motion transfer, and cinematic composition.

4. useful script

download

sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset
cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz
tar -xvf Scene-Decoupled-Video-Dataset.tar.gz

camera visualization

To visualize the camera, please refer to here.
Perspective Projection To extract perspective frames from the panoramic images:
```
python extract_scene_from_panorama.py
```

Downloads last month: 700

Paper for KlingTeam/Scene-Decoupled-Video-dataset

Paper • 2602.06959 • Published Feb 6

URL: https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-dataset