The dataset viewer is not available because its heuristics could not detect any supported data files. You can try uploading some data files, or configuring the data files location manually.
YAML Metadata Warning:The task_categories "video-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
CVPR 2026: Arxiv | Project Page
Scene-Decoupled Video Dataset
TL;DR: The Scene-Decoupled Video Dataset, introduced in CineScene, is a large-scale synthetic dataset for video generation with decoupled scene, which encompasses diverse scenes, subjects, and camera movements. This dataset contains camera trajectories, equirectangular panorama (scene image), and videos with/without dynamic subject. The data is organized into "With Human" (whuman) and "Without Human" (wohuman) categories, while panoramas are scene-decoupled and shared across both.
1. Directory Tree
.
├── camera/ # Camera trajectories and metadata
│ ├── whuman/ # Sequences containing human characters
│ │ └── <scene_id>/ # e.g., scene1_3x3_loc1_scene_AncientTempleEnv/
│ │ └── <scene_id>_cam.json # Camera parameters
│ └── wohuman/ # Sequences with environment only
│ └── <scene_id>/
│ └── <scene_id>_cam.json
│
├── panorama/ # Scene-decoupled environment maps
│ └── <scene_id>/ # Shared between whuman and wohuman
│ └── <scene_id>_pano.jpeg # 360° Equirectangular panoramic image
│
└── video/ # Rendered video sequences (MP4)
├── whuman/ # Videos with human characters
│ └── <scene_id>/
│ ├── <scene_id>_01_24mm.mp4 # Sub-sequences (01, 02, etc.)
│ ├── <scene_id>_02_24mm.mp4
│ └── ...
└── wohuman/ # Videos without human characters
└── <scene_id>/
├── <scene_id>_01_24mm.mp4
├── ...
2. Dataset Statistics
- Total Scale: 46,816 videos.
- Scenes: 3,400 scenes (comprising both whuman and wohuman scenes) across 35 high-quality 3D environments.
- Trajectories: 46,816 camera paths (7 distinct camera trajectories per scene).
- Panorama: 360° Equirectangular images for every scene, providing a complete background reference for scene conditioning.
| Property | Value |
|---|---|
| Video Resolution | 672 x 384 |
| Frame Count | 81 frames per video |
| Frame Rate | 15 FPS |
| View Change Range | Up to 75° |
| Decoupled Scene | 360° Equirectangular (Panorama) |
| Panorama Resolution | 2048 x 1024 |
3. Dataset Construction
We follow the asset collection pipeline established by RecamMaster, but introduce three significant enhancements to support more complex generative tasks:
- Decoupled Scenes: We provide static 360° panoramic images (Equirectangular) for every scene. This allows for explicit background conditioning and facilitates novel view synthesis from any angle.
- Extended Camera Range: Our dataset covers significantly larger view changes (approx. 75°) compared to the 5–60° range provided in previous datasets.
- Paired Subject/Background Data: Every scene includes both "with-subject" (whuman) and "background-only" (wohuman) video sequences. This paired data is ideal for training models on subject-background decoupling, motion transfer, and cinematic composition.
4. useful script
- download
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset
cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz
tar -xvf Scene-Decoupled-Video-Dataset.tar.gz
camera visualization
To visualize the camera, please refer to here.
Perspective Projection To extract perspective frames from the panoramic images:
python extract_scene_from_panorama.py
- Downloads last month
- 700
