Dataset Viewer (First 5GB)

image imagewidth (px) 74 6.59k	label listlengths 1 4	origin stringclasses 5 values
[ 1 ]	imagenet_r	[ "goldfish" ]
[ 1 ]	imagenet_sketch	[ "goldfish" ]
[ 2 ]	imagenet	[ "great_white_shark" ]
[ 2 ]	imagenet_r	[ "great_white_shark" ]
[ 2 ]	imagenet_r	[ "great_white_shark" ]
[ 2 ]	imagenet_r	[ "great_white_shark" ]
[ 3 ]	imagenet_sketch	[ "tiger_shark" ]
[ 4 ]	imagenet_r	[ "hammerhead" ]
[ 4 ]	imagenet_r	[ "hammerhead" ]
[ 4 ]	imagenet_r	[ "hammerhead" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet	[ "electric_ray" ]
[ 5 ]	imagenet_sketch	[ "electric_ray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_r	[ "stingray" ]
[ 6 ]	imagenet_a	[ "stingray" ]
[ 6 ]	imagenet_a	[ "stingray" ]
[ 6 ]	imagenet_a	[ "stingray" ]
[ 6 ]	imagenet_a	[ "stingray" ]
[ 6 ]	imagenet_a	[ "stingray" ]
[ 6 ]	imagenet_sketch	[ "stingray" ]
[ 6 ]	imagenet_sketch	[ "stingray" ]
[ 9 ]	imagenet_r	[ "ostrich" ]
[ 9 ]	imagenet_r	[ "ostrich" ]
[ 10 ]	imagenet	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 10 ]	imagenet_sketch	[ "brambling" ]
[ 11 ]	imagenet_a	[ "goldfinch" ]
[ 11 ]	imagenet_sketch	[ "goldfinch" ]
[ 11 ]	imagenet_sketch	[ "goldfinch" ]
[ 11 ]	imagenet_sketch	[ "goldfinch" ]
[ 11 ]	imagenet_sketch	[ "goldfinch" ]
[ 11 ]	imagenet_sketch	[ "goldfinch" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]

End of preview. Expand in Data Studio

Dataset Card for "ImageNet-Hard"

Project Page - ArXiv - Paper - Github - Image Browser

Dataset Summary

ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.

ImageNet-Hard-4K: For the 4K version please refere to this dataset.

Dataset Distribution

👁 Dataset Distribution

Classifiers Performance

Model	Accuracy
AlexNet	7.34
VGG-16	12.00
ResNet-18	10.86
ResNet-50	14.74
ViT-B/32	18.52
EfficientNet-B0	16.57
EfficientNet-B7	23.20
EfficientNet-L2-Ns	39.00
CLIP-ViT-L/14@224px	1.86
CLIP-ViT-L/14@336px	2.02
OpenCLIP-ViT-bigG-14	15.93
OpenCLIP-ViT-L-14	15.60

Evaluation Code

Supported Tasks

image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).

Languages

The english_label field in the dataset are in English.

Dataset Structure

Data Instances

An example looks like this:

{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
 'label': [0],
 'origin': 'imagenet_sketch',
 'english_label': ['tench']
}

Data Fields

The data instances have the following fields:

image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
label: A List[int] collection containing the ground-truth ids.
origin: A string containing source dataset.
english_label: A List[str] collection containg the english labels for the ground-truth classes.

Data Splits

This dataset is a validation-only set.

Dataset Creation

Source Data

This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.

Citation Information

@article{taesiri2023zoom,
 title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
 author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
 journal={arXiv preprint arXiv:2304.05538},
 year={2023}
}

Downloads last month: 203

Models trained or fine-tuned on taesiri/imagenet-hard

Graph Machine Learning • Updated Jun 11, 2024

Spaces using taesiri/imagenet-hard 3

Collection including taesiri/imagenet-hard

The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification • 4 items • Updated Sep 12, 2025

Paper for taesiri/imagenet-hard

Paper • 2304.05538 • Published Apr 11, 2023 • 2

URL: https://huggingface.co/datasets/taesiri/imagenet-hard

⇱ taesiri/imagenet-hard · Datasets at Hugging Face