id int64 0 805 | image imagewidth (px) 2.01k 4.1k | label listlengths 1 3 | origin stringclasses 4
values | english_label listlengths 1 3 |
|---|---|---|---|---|
0 | [
1
] | imagenet_r | [
"goldfish"
] | |
1 | [
1
] | imagenet_sketch | [
"goldfish"
] | |
2 | [
2
] | imagenet | [
"great_white_shark"
] | |
3 | [
2
] | imagenet_r | [
"great_white_shark"
] | |
4 | [
2
] | imagenet_r | [
"great_white_shark"
] | |
5 | [
2
] | imagenet_r | [
"great_white_shark"
] | |
6 | [
3
] | imagenet_sketch | [
"tiger_shark"
] | |
7 | [
4
] | imagenet_r | [
"hammerhead"
] | |
8 | [
4
] | imagenet_r | [
"hammerhead"
] | |
9 | [
4
] | imagenet_r | [
"hammerhead"
] | |
10 | [
5
] | imagenet | [
"electric_ray"
] | |
11 | [
5
] | imagenet | [
"electric_ray"
] | |
12 | [
5
] | imagenet | [
"electric_ray"
] | |
13 | [
5
] | imagenet | [
"electric_ray"
] | |
14 | [
5
] | imagenet | [
"electric_ray"
] | |
15 | [
5
] | imagenet | [
"electric_ray"
] | |
16 | [
5
] | imagenet | [
"electric_ray"
] | |
17 | [
5
] | imagenet | [
"electric_ray"
] | |
18 | [
5
] | imagenet | [
"electric_ray"
] | |
19 | [
5
] | imagenet | [
"electric_ray"
] | |
20 | [
5
] | imagenet | [
"electric_ray"
] | |
21 | [
5
] | imagenet | [
"electric_ray"
] | |
22 | [
5
] | imagenet_sketch | [
"electric_ray"
] | |
23 | [
6
] | imagenet | [
"stingray"
] | |
24 | [
6
] | imagenet | [
"stingray"
] | |
25 | [
6
] | imagenet | [
"stingray"
] | |
26 | [
6
] | imagenet | [
"stingray"
] | |
27 | [
6
] | imagenet | [
"stingray"
] | |
28 | [
6
] | imagenet | [
"stingray"
] | |
29 | [
6
] | imagenet | [
"stingray"
] | |
30 | [
6
] | imagenet | [
"stingray"
] | |
31 | [
6
] | imagenet | [
"stingray"
] | |
32 | [
6
] | imagenet | [
"stingray"
] | |
33 | [
6
] | imagenet | [
"stingray"
] | |
34 | [
6
] | imagenet | [
"stingray"
] | |
35 | [
6
] | imagenet | [
"stingray"
] | |
36 | [
6
] | imagenet | [
"stingray"
] | |
37 | [
6
] | imagenet_r | [
"stingray"
] | |
38 | [
6
] | imagenet_r | [
"stingray"
] | |
39 | [
6
] | imagenet_r | [
"stingray"
] | |
40 | [
6
] | imagenet_r | [
"stingray"
] | |
41 | [
6
] | imagenet_r | [
"stingray"
] | |
42 | [
6
] | imagenet_r | [
"stingray"
] | |
43 | [
6
] | imagenet_r | [
"stingray"
] | |
44 | [
6
] | imagenet_r | [
"stingray"
] | |
45 | [
6
] | imagenet_r | [
"stingray"
] | |
46 | [
6
] | imagenet_r | [
"stingray"
] | |
47 | [
6
] | imagenet_a | [
"stingray"
] | |
48 | [
6
] | imagenet_a | [
"stingray"
] | |
49 | [
6
] | imagenet_a | [
"stingray"
] | |
50 | [
6
] | imagenet_a | [
"stingray"
] | |
51 | [
6
] | imagenet_a | [
"stingray"
] | |
52 | [
6
] | imagenet_sketch | [
"stingray"
] | |
53 | [
6
] | imagenet_sketch | [
"stingray"
] | |
54 | [
9
] | imagenet_r | [
"ostrich"
] | |
55 | [
9
] | imagenet_r | [
"ostrich"
] | |
56 | [
10
] | imagenet | [
"brambling"
] | |
57 | [
10
] | imagenet_sketch | [
"brambling"
] | |
58 | [
10
] | imagenet_sketch | [
"brambling"
] | |
59 | [
10
] | imagenet_sketch | [
"brambling"
] | |
60 | [
10
] | imagenet_sketch | [
"brambling"
] | |
61 | [
10
] | imagenet_sketch | [
"brambling"
] | |
62 | [
10
] | imagenet_sketch | [
"brambling"
] | |
63 | [
10
] | imagenet_sketch | [
"brambling"
] | |
64 | [
10
] | imagenet_sketch | [
"brambling"
] | |
65 | [
10
] | imagenet_sketch | [
"brambling"
] | |
66 | [
10
] | imagenet_sketch | [
"brambling"
] | |
67 | [
10
] | imagenet_sketch | [
"brambling"
] | |
68 | [
10
] | imagenet_sketch | [
"brambling"
] | |
69 | [
11
] | imagenet_a | [
"goldfinch"
] | |
70 | [
11
] | imagenet_sketch | [
"goldfinch"
] | |
71 | [
11
] | imagenet_sketch | [
"goldfinch"
] | |
72 | [
11
] | imagenet_sketch | [
"goldfinch"
] | |
73 | [
11
] | imagenet_sketch | [
"goldfinch"
] | |
74 | [
11
] | imagenet_sketch | [
"goldfinch"
] | |
75 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
76 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
77 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
78 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
79 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
80 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
81 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
82 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
83 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
84 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
85 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
86 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
87 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
88 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
89 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
90 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
91 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
92 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
93 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
94 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
95 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
96 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
97 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
98 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
99 | [
14
] | imagenet_sketch | [
"indigo_bunting"
] |
Dataset Card for "Imagenet-Hard-4K"
Project Page - Paper - Github
ImageNet-Hard-4K is 4K version of the original ImageNet-Hard dataset, which is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.
Upscaling Procedure
We employed GigaGAN to upscale each image from the original ImageNet-Hard dataset to a resolution of 4K.
Dataset Distribution
Classifiers Performance
| Model | Accuracy |
|---|---|
| AlexNet | 7.08 |
| VGG-16 | 11.32 |
| ResNet-18 | 10.42 |
| ResNet-50 | 13.93 |
| ViT-B/32 | 18.12 |
| EfficientNet-B0 | 12.94 |
| EfficientNet-B7 | 18.67 |
| EfficientNet-L2-Ns | 28.42 |
| CLIP-ViT-L/14@224px | 1.81 |
| CLIP-ViT-L/14@336px | 1.88 |
| OpenCLIP-ViT-bigG-14 | 14.33 |
| OpenCLIP-ViT-L-14 | 13.04 |
Evaluation Code
- CLIP ๐ Open In Colab
- Other models ๐ Open In Colab
Supported Tasks
image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).
Languages
The english_label field in the dataset are in English.
Dataset Structure
Data Instances
An example looks like this:
{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
'label': [0],
'origin': 'imagenet_sketch',
'english_label': ['tench']
}
Data Fields
The data instances have the following fields:
- image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
- label: A List[int] collection containing the ground-truth ids.
- origin: A string containing source dataset.
- english_label: A List[str] collection containg the english labels for the ground-truth classes.
Data Splits
This dataset is a validation-only set.
Dataset Creation
Source Data
This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.
Citation Information
@article{taesiri2023zoom,
title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
journal={arXiv preprint arXiv:2304.05538},
year={2023}
}
- Downloads last month
- 513
