VOOZH about

URL: https://huggingface.co/datasets/taesiri/imagenet-hard

⇱ taesiri/imagenet-hard · Datasets at Hugging Face


Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate
image
imagewidth (px)
74
6.59k
label
listlengths
1
4
origin
stringclasses
5 values
english_label
listlengths
1
4
[ 1 ]
imagenet_r
[ "goldfish" ]
[ 1 ]
imagenet_sketch
[ "goldfish" ]
[ 2 ]
imagenet
[ "great_white_shark" ]
[ 2 ]
imagenet_r
[ "great_white_shark" ]
[ 2 ]
imagenet_r
[ "great_white_shark" ]
[ 2 ]
imagenet_r
[ "great_white_shark" ]
[ 3 ]
imagenet_sketch
[ "tiger_shark" ]
[ 4 ]
imagenet_r
[ "hammerhead" ]
[ 4 ]
imagenet_r
[ "hammerhead" ]
[ 4 ]
imagenet_r
[ "hammerhead" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet
[ "electric_ray" ]
[ 5 ]
imagenet_sketch
[ "electric_ray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_r
[ "stingray" ]
[ 6 ]
imagenet_a
[ "stingray" ]
[ 6 ]
imagenet_a
[ "stingray" ]
[ 6 ]
imagenet_a
[ "stingray" ]
[ 6 ]
imagenet_a
[ "stingray" ]
[ 6 ]
imagenet_a
[ "stingray" ]
[ 6 ]
imagenet_sketch
[ "stingray" ]
[ 6 ]
imagenet_sketch
[ "stingray" ]
[ 9 ]
imagenet_r
[ "ostrich" ]
[ 9 ]
imagenet_r
[ "ostrich" ]
[ 10 ]
imagenet
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 10 ]
imagenet_sketch
[ "brambling" ]
[ 11 ]
imagenet_a
[ "goldfinch" ]
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
End of preview. Expand in Data Studio

Dataset Card for "ImageNet-Hard"

Project Page - ArXiv - Paper - Github - Image Browser

Dataset Summary

ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.

ImageNet-Hard-4K: For the 4K version please refere to this dataset.

Dataset Distribution

👁 Dataset Distribution

Classifiers Performance

Model Accuracy
AlexNet 7.34
VGG-16 12.00
ResNet-18 10.86
ResNet-50 14.74
ViT-B/32 18.52
EfficientNet-B0 16.57
EfficientNet-B7 23.20
EfficientNet-L2-Ns 39.00
CLIP-ViT-L/14@224px 1.86
CLIP-ViT-L/14@336px 2.02
OpenCLIP-ViT-bigG-14 15.93
OpenCLIP-ViT-L-14 15.60

Evaluation Code

Supported Tasks

  • image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).

Languages

The english_label field in the dataset are in English.

Dataset Structure

Data Instances

An example looks like this:

{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
 'label': [0],
 'origin': 'imagenet_sketch',
 'english_label': ['tench']
}

Data Fields

The data instances have the following fields:

  • image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
  • label: A List[int] collection containing the ground-truth ids.
  • origin: A string containing source dataset.
  • english_label: A List[str] collection containg the english labels for the ground-truth classes.

Data Splits

This dataset is a validation-only set.

Dataset Creation

Source Data

This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.

Citation Information

@article{taesiri2023zoom,
 title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
 author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
 journal={arXiv preprint arXiv:2304.05538},
 year={2023}
}
Downloads last month
203

Models trained or fine-tuned on taesiri/imagenet-hard

Spaces using taesiri/imagenet-hard 3

Collection including taesiri/imagenet-hard

Paper for taesiri/imagenet-hard