Dataset Viewer (First 5GB)

id int64 0 805	image imagewidth (px) 2.01k 4.1k	label listlengths 1 3	origin stringclasses 4 values
0	[ 1 ]	imagenet_r	[ "goldfish" ]
1	[ 1 ]	imagenet_sketch	[ "goldfish" ]
2	[ 2 ]	imagenet	[ "great_white_shark" ]
3	[ 2 ]	imagenet_r	[ "great_white_shark" ]
4	[ 2 ]	imagenet_r	[ "great_white_shark" ]
5	[ 2 ]	imagenet_r	[ "great_white_shark" ]
6	[ 3 ]	imagenet_sketch	[ "tiger_shark" ]
7	[ 4 ]	imagenet_r	[ "hammerhead" ]
8	[ 4 ]	imagenet_r	[ "hammerhead" ]
9	[ 4 ]	imagenet_r	[ "hammerhead" ]
10	[ 5 ]	imagenet	[ "electric_ray" ]
11	[ 5 ]	imagenet	[ "electric_ray" ]
12	[ 5 ]	imagenet	[ "electric_ray" ]
13	[ 5 ]	imagenet	[ "electric_ray" ]
14	[ 5 ]	imagenet	[ "electric_ray" ]
15	[ 5 ]	imagenet	[ "electric_ray" ]
16	[ 5 ]	imagenet	[ "electric_ray" ]
17	[ 5 ]	imagenet	[ "electric_ray" ]
18	[ 5 ]	imagenet	[ "electric_ray" ]
19	[ 5 ]	imagenet	[ "electric_ray" ]
20	[ 5 ]	imagenet	[ "electric_ray" ]
21	[ 5 ]	imagenet	[ "electric_ray" ]
22	[ 5 ]	imagenet_sketch	[ "electric_ray" ]
23	[ 6 ]	imagenet	[ "stingray" ]
24	[ 6 ]	imagenet	[ "stingray" ]
25	[ 6 ]	imagenet	[ "stingray" ]
26	[ 6 ]	imagenet	[ "stingray" ]
27	[ 6 ]	imagenet	[ "stingray" ]
28	[ 6 ]	imagenet	[ "stingray" ]
29	[ 6 ]	imagenet	[ "stingray" ]
30	[ 6 ]	imagenet	[ "stingray" ]
31	[ 6 ]	imagenet	[ "stingray" ]
32	[ 6 ]	imagenet	[ "stingray" ]
33	[ 6 ]	imagenet	[ "stingray" ]
34	[ 6 ]	imagenet	[ "stingray" ]
35	[ 6 ]	imagenet	[ "stingray" ]
36	[ 6 ]	imagenet	[ "stingray" ]
37	[ 6 ]	imagenet_r	[ "stingray" ]
38	[ 6 ]	imagenet_r	[ "stingray" ]
39	[ 6 ]	imagenet_r	[ "stingray" ]
40	[ 6 ]	imagenet_r	[ "stingray" ]
41	[ 6 ]	imagenet_r	[ "stingray" ]
42	[ 6 ]	imagenet_r	[ "stingray" ]
43	[ 6 ]	imagenet_r	[ "stingray" ]
44	[ 6 ]	imagenet_r	[ "stingray" ]
45	[ 6 ]	imagenet_r	[ "stingray" ]
46	[ 6 ]	imagenet_r	[ "stingray" ]
47	[ 6 ]	imagenet_a	[ "stingray" ]
48	[ 6 ]	imagenet_a	[ "stingray" ]
49	[ 6 ]	imagenet_a	[ "stingray" ]
50	[ 6 ]	imagenet_a	[ "stingray" ]
51	[ 6 ]	imagenet_a	[ "stingray" ]
52	[ 6 ]	imagenet_sketch	[ "stingray" ]
53	[ 6 ]	imagenet_sketch	[ "stingray" ]
54	[ 9 ]	imagenet_r	[ "ostrich" ]
55	[ 9 ]	imagenet_r	[ "ostrich" ]
56	[ 10 ]	imagenet	[ "brambling" ]
57	[ 10 ]	imagenet_sketch	[ "brambling" ]
58	[ 10 ]	imagenet_sketch	[ "brambling" ]
59	[ 10 ]	imagenet_sketch	[ "brambling" ]
60	[ 10 ]	imagenet_sketch	[ "brambling" ]
61	[ 10 ]	imagenet_sketch	[ "brambling" ]
62	[ 10 ]	imagenet_sketch	[ "brambling" ]
63	[ 10 ]	imagenet_sketch	[ "brambling" ]
64	[ 10 ]	imagenet_sketch	[ "brambling" ]
65	[ 10 ]	imagenet_sketch	[ "brambling" ]
66	[ 10 ]	imagenet_sketch	[ "brambling" ]
67	[ 10 ]	imagenet_sketch	[ "brambling" ]
68	[ 10 ]	imagenet_sketch	[ "brambling" ]
69	[ 11 ]	imagenet_a	[ "goldfinch" ]
70	[ 11 ]	imagenet_sketch	[ "goldfinch" ]
71	[ 11 ]	imagenet_sketch	[ "goldfinch" ]
72	[ 11 ]	imagenet_sketch	[ "goldfinch" ]
73	[ 11 ]	imagenet_sketch	[ "goldfinch" ]
74	[ 11 ]	imagenet_sketch	[ "goldfinch" ]
75	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
76	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
77	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
78	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
79	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
80	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
81	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
82	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
83	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
84	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
85	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
86	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
87	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
88	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
89	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
90	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
91	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
92	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
93	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
94	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
95	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
96	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
97	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
98	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]
99	[ 14 ]	imagenet_sketch	[ "indigo_bunting" ]

End of preview. Expand in Data Studio

Dataset Card for "Imagenet-Hard-4K"

Project Page - Paper - Github

ImageNet-Hard-4K is 4K version of the original ImageNet-Hard dataset, which is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.

Upscaling Procedure

We employed GigaGAN to upscale each image from the original ImageNet-Hard dataset to a resolution of 4K.

Dataset Distribution

👁 Dataset Distribution

Classifiers Performance

Model	Accuracy
AlexNet	7.08
VGG-16	11.32
ResNet-18	10.42
ResNet-50	13.93
ViT-B/32	18.12
EfficientNet-B0	12.94
EfficientNet-B7	18.67
EfficientNet-L2-Ns	28.42
CLIP-ViT-L/14@224px	1.81
CLIP-ViT-L/14@336px	1.88
OpenCLIP-ViT-bigG-14	14.33
OpenCLIP-ViT-L-14	13.04

Evaluation Code

CLIP 👁 Open In Colab
Other models 👁 Open In Colab

Supported Tasks

image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).

Languages

The english_label field in the dataset are in English.

Dataset Structure

Data Instances

An example looks like this:

{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
 'label': [0],
 'origin': 'imagenet_sketch',
 'english_label': ['tench']
}

Data Fields

The data instances have the following fields:

image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
label: A List[int] collection containing the ground-truth ids.
origin: A string containing source dataset.
english_label: A List[str] collection containg the english labels for the ground-truth classes.

Data Splits

This dataset is a validation-only set.

Dataset Creation

Source Data

This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.

Citation Information

@article{taesiri2023zoom,
 title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
 author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
 journal={arXiv preprint arXiv:2304.05538},
 year={2023}
}

Downloads last month: 513

Collection including taesiri/imagenet-hard-4K

The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification • 4 items • Updated Sep 12, 2025

Paper for taesiri/imagenet-hard-4K

Paper • 2304.05538 • Published Apr 11, 2023 • 2

URL: https://huggingface.co/datasets/taesiri/imagenet-hard-4K

⇱ taesiri/imagenet-hard-4K · Datasets at Hugging Face