VOOZH about

URL: https://huggingface.co/datasets/galilai-group/shapes3d

⇱ galilai-group/shapes3d · Datasets at Hugging Face


Dataset Viewer

Dataset Card for 3dshapes

Dataset Description

The 3dshapes dataset is a synthetic 3D object image dataset designed for benchmarking algorithms in disentangled representation learning and unsupervised representation learning.

It was introduced in the FactorVAE paper [Kim & Mnih, ICML 2018], as one of the standard testbeds for learning interpretable and disentangled latent factors. The dataset consists of images of 3D procedurally generated scenes, where 6 ground-truth independent factors of variation are explicitly controlled:

  • Floor color (hue)
  • Wall color (hue)
  • Object color (hue)
  • Object size (scale)
  • Object shape (categorical)
  • Object orientation (rotation angle)

3dshapes is generated as a full Cartesian product of all factor combinations, making it perfectly suited for systematic evaluation of disentanglement. The dataset contains 480,000 images at a resolution of 64×64 pixels, covering all possible combinations of the 6 factors exactly once. The images are stored in row-major order according to the factor sweep, enabling precise control over factor-based evaluation. 👁 Dataset Visualization

Dataset Source

Dataset Structure

Factors Possible Values
floor_color (hue) 10 values linearly spaced in [0, 1]
wall_color (hue) 10 values linearly spaced in [0, 1]
object_color (hue) 10 values linearly spaced in [0, 1]
scale 8 values linearly spaced in [0.75, 1.25]
shape 4 values: 0, 1, 2, 3
orientation 15 values linearly spaced in [-30, 30]

Each image corresponds to a unique combination of these 6 factors. The images are stored in a row-major order (fastest-changing factor is orientation, slowest-changing factor is floor_color).

Why no train/test split?

The 3dshapes dataset does not provide an official train/test split. It is designed for representation learning research, where the goal is to learn disentangled and interpretable latent factors. Since the dataset is a complete Cartesian product of all factor combinations, models typically require access to the full dataset to explore factor-wise variations.

Example Usage

Below is a quick example of how to load this dataset via the Hugging Face Datasets library:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("randall-lab/shapes3d", split="train", trust_remote_code=True)

# Access a sample from the dataset
example = dataset[0]
image = example["image"]
label = example["label"] # Value labels: [floor_hue, wall_hue, object_hue, scale, shape, orientation]
label_index = example["label_index"] # Index labels: [floor_idx, wall_idx, object_idx, scale_idx, shape_idx, orientation_idx]

# Label Value
floor_value = example["floor"] # 0-1
wall_value = example["wall"] # 0-1
object_value = example["object"] # 0-1
scale_value = example["scale"] # 0.75-1.25
shape_value = example["shape"] # 0,1,2,3
orientation_value = example["orientation"] # -30 - 30

# Label index
floor_idx = example["floor_idx"] # 0-9
wall_idx = example["wall_idx"] # 0-9
object_idx = example["object_idx"] # 0-9
scale_idx = example["scale_idx"] # 0-7
shape_idx = example["shape_idx"] # 0-3
orientation_idx = example["orientation_idx"] # 0-14

image.show() # Display the image
print(f"Label (factor values): {label}")
print(f"Label (factor indices): {label_index}")

If you are using colab, you should update datasets to avoid errors

pip install -U datasets

Citation

@InProceedings{pmlr-v80-kim18b,
 title = 	 {Disentangling by Factorising},
 author = {Kim, Hyunjik and Mnih, Andriy},
 booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
 pages = 	 {2649--2658},
 year = 	 {2018},
 editor = 	 {Dy, Jennifer and Krause, Andreas},
 volume = 	 {80},
 series = 	 {Proceedings of Machine Learning Research},
 month = 	 {10--15 Jul},
 publisher = {PMLR},
 pdf = 	 {http://proceedings.mlr.press/v80/kim18b/kim18b.pdf},
 url = 	 {https://proceedings.mlr.press/v80/kim18b.html},
 abstract = 	 {We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon beta-VAE by providing a better trade-off between disentanglement and reconstruction quality and being more robust to the number of training iterations. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.}
}
Downloads last month
75