VOOZH about

URL: https://huggingface.co/datasets/taesiri/imagenet-hard-4K

โ‡ฑ taesiri/imagenet-hard-4K ยท Datasets at Hugging Face


Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate
id
int64
0
805
image
imagewidth (px)
2.01k
4.1k
label
listlengths
1
3
origin
stringclasses
4 values
english_label
listlengths
1
3
0
[ 1 ]
imagenet_r
[ "goldfish" ]
1
[ 1 ]
imagenet_sketch
[ "goldfish" ]
2
[ 2 ]
imagenet
[ "great_white_shark" ]
3
[ 2 ]
imagenet_r
[ "great_white_shark" ]
4
[ 2 ]
imagenet_r
[ "great_white_shark" ]
5
[ 2 ]
imagenet_r
[ "great_white_shark" ]
6
[ 3 ]
imagenet_sketch
[ "tiger_shark" ]
7
[ 4 ]
imagenet_r
[ "hammerhead" ]
8
[ 4 ]
imagenet_r
[ "hammerhead" ]
9
[ 4 ]
imagenet_r
[ "hammerhead" ]
10
[ 5 ]
imagenet
[ "electric_ray" ]
11
[ 5 ]
imagenet
[ "electric_ray" ]
12
[ 5 ]
imagenet
[ "electric_ray" ]
13
[ 5 ]
imagenet
[ "electric_ray" ]
14
[ 5 ]
imagenet
[ "electric_ray" ]
15
[ 5 ]
imagenet
[ "electric_ray" ]
16
[ 5 ]
imagenet
[ "electric_ray" ]
17
[ 5 ]
imagenet
[ "electric_ray" ]
18
[ 5 ]
imagenet
[ "electric_ray" ]
19
[ 5 ]
imagenet
[ "electric_ray" ]
20
[ 5 ]
imagenet
[ "electric_ray" ]
21
[ 5 ]
imagenet
[ "electric_ray" ]
22
[ 5 ]
imagenet_sketch
[ "electric_ray" ]
23
[ 6 ]
imagenet
[ "stingray" ]
24
[ 6 ]
imagenet
[ "stingray" ]
25
[ 6 ]
imagenet
[ "stingray" ]
26
[ 6 ]
imagenet
[ "stingray" ]
27
[ 6 ]
imagenet
[ "stingray" ]
28
[ 6 ]
imagenet
[ "stingray" ]
29
[ 6 ]
imagenet
[ "stingray" ]
30
[ 6 ]
imagenet
[ "stingray" ]
31
[ 6 ]
imagenet
[ "stingray" ]
32
[ 6 ]
imagenet
[ "stingray" ]
33
[ 6 ]
imagenet
[ "stingray" ]
34
[ 6 ]
imagenet
[ "stingray" ]
35
[ 6 ]
imagenet
[ "stingray" ]
36
[ 6 ]
imagenet
[ "stingray" ]
37
[ 6 ]
imagenet_r
[ "stingray" ]
38
[ 6 ]
imagenet_r
[ "stingray" ]
39
[ 6 ]
imagenet_r
[ "stingray" ]
40
[ 6 ]
imagenet_r
[ "stingray" ]
41
[ 6 ]
imagenet_r
[ "stingray" ]
42
[ 6 ]
imagenet_r
[ "stingray" ]
43
[ 6 ]
imagenet_r
[ "stingray" ]
44
[ 6 ]
imagenet_r
[ "stingray" ]
45
[ 6 ]
imagenet_r
[ "stingray" ]
46
[ 6 ]
imagenet_r
[ "stingray" ]
47
[ 6 ]
imagenet_a
[ "stingray" ]
48
[ 6 ]
imagenet_a
[ "stingray" ]
49
[ 6 ]
imagenet_a
[ "stingray" ]
50
[ 6 ]
imagenet_a
[ "stingray" ]
51
[ 6 ]
imagenet_a
[ "stingray" ]
52
[ 6 ]
imagenet_sketch
[ "stingray" ]
53
[ 6 ]
imagenet_sketch
[ "stingray" ]
54
[ 9 ]
imagenet_r
[ "ostrich" ]
55
[ 9 ]
imagenet_r
[ "ostrich" ]
56
[ 10 ]
imagenet
[ "brambling" ]
57
[ 10 ]
imagenet_sketch
[ "brambling" ]
58
[ 10 ]
imagenet_sketch
[ "brambling" ]
59
[ 10 ]
imagenet_sketch
[ "brambling" ]
60
[ 10 ]
imagenet_sketch
[ "brambling" ]
61
[ 10 ]
imagenet_sketch
[ "brambling" ]
62
[ 10 ]
imagenet_sketch
[ "brambling" ]
63
[ 10 ]
imagenet_sketch
[ "brambling" ]
64
[ 10 ]
imagenet_sketch
[ "brambling" ]
65
[ 10 ]
imagenet_sketch
[ "brambling" ]
66
[ 10 ]
imagenet_sketch
[ "brambling" ]
67
[ 10 ]
imagenet_sketch
[ "brambling" ]
68
[ 10 ]
imagenet_sketch
[ "brambling" ]
69
[ 11 ]
imagenet_a
[ "goldfinch" ]
70
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
71
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
72
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
73
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
74
[ 11 ]
imagenet_sketch
[ "goldfinch" ]
75
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
76
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
77
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
78
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
79
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
80
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
81
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
82
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
83
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
84
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
85
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
86
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
87
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
88
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
89
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
90
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
91
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
92
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
93
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
94
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
95
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
96
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
97
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
98
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
99
[ 14 ]
imagenet_sketch
[ "indigo_bunting" ]
End of preview. Expand in Data Studio

Dataset Card for "Imagenet-Hard-4K"

Project Page - Paper - Github

ImageNet-Hard-4K is 4K version of the original ImageNet-Hard dataset, which is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.

Upscaling Procedure

We employed GigaGAN to upscale each image from the original ImageNet-Hard dataset to a resolution of 4K.

Dataset Distribution

๐Ÿ‘ Dataset Distribution

Classifiers Performance

Model Accuracy
AlexNet 7.08
VGG-16 11.32
ResNet-18 10.42
ResNet-50 13.93
ViT-B/32 18.12
EfficientNet-B0 12.94
EfficientNet-B7 18.67
EfficientNet-L2-Ns 28.42
CLIP-ViT-L/14@224px 1.81
CLIP-ViT-L/14@336px 1.88
OpenCLIP-ViT-bigG-14 14.33
OpenCLIP-ViT-L-14 13.04

Evaluation Code

Supported Tasks

  • image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).

Languages

The english_label field in the dataset are in English.

Dataset Structure

Data Instances

An example looks like this:

{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
 'label': [0],
 'origin': 'imagenet_sketch',
 'english_label': ['tench']
}

Data Fields

The data instances have the following fields:

  • image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
  • label: A List[int] collection containing the ground-truth ids.
  • origin: A string containing source dataset.
  • english_label: A List[str] collection containg the english labels for the ground-truth classes.

Data Splits

This dataset is a validation-only set.

Dataset Creation

Source Data

This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.

Citation Information

@article{taesiri2023zoom,
 title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
 author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
 journal={arXiv preprint arXiv:2304.05538},
 year={2023}
}
Downloads last month
513

Collection including taesiri/imagenet-hard-4K

Paper for taesiri/imagenet-hard-4K