image imagewidth (px) 74 6.59k | label listlengths 1 4 | origin stringclasses 5
values | english_label listlengths 1 4 |
|---|---|---|---|
[
1
] | imagenet_r | [
"goldfish"
] | |
[
1
] | imagenet_sketch | [
"goldfish"
] | |
[
2
] | imagenet | [
"great_white_shark"
] | |
[
2
] | imagenet_r | [
"great_white_shark"
] | |
[
2
] | imagenet_r | [
"great_white_shark"
] | |
[
2
] | imagenet_r | [
"great_white_shark"
] | |
[
3
] | imagenet_sketch | [
"tiger_shark"
] | |
[
4
] | imagenet_r | [
"hammerhead"
] | |
[
4
] | imagenet_r | [
"hammerhead"
] | |
[
4
] | imagenet_r | [
"hammerhead"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet | [
"electric_ray"
] | |
[
5
] | imagenet_sketch | [
"electric_ray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_r | [
"stingray"
] | |
[
6
] | imagenet_a | [
"stingray"
] | |
[
6
] | imagenet_a | [
"stingray"
] | |
[
6
] | imagenet_a | [
"stingray"
] | |
[
6
] | imagenet_a | [
"stingray"
] | |
[
6
] | imagenet_a | [
"stingray"
] | |
[
6
] | imagenet_sketch | [
"stingray"
] | |
[
6
] | imagenet_sketch | [
"stingray"
] | |
[
9
] | imagenet_r | [
"ostrich"
] | |
[
9
] | imagenet_r | [
"ostrich"
] | |
[
10
] | imagenet | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
10
] | imagenet_sketch | [
"brambling"
] | |
[
11
] | imagenet_a | [
"goldfinch"
] | |
[
11
] | imagenet_sketch | [
"goldfinch"
] | |
[
11
] | imagenet_sketch | [
"goldfinch"
] | |
[
11
] | imagenet_sketch | [
"goldfinch"
] | |
[
11
] | imagenet_sketch | [
"goldfinch"
] | |
[
11
] | imagenet_sketch | [
"goldfinch"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] | |
[
14
] | imagenet_sketch | [
"indigo_bunting"
] |
Dataset Card for "ImageNet-Hard"
Project Page - ArXiv - Paper - Github - Image Browser
Dataset Summary
ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.
ImageNet-Hard-4K: For the 4K version please refere to this dataset.
Dataset Distribution
Classifiers Performance
| Model | Accuracy |
|---|---|
| AlexNet | 7.34 |
| VGG-16 | 12.00 |
| ResNet-18 | 10.86 |
| ResNet-50 | 14.74 |
| ViT-B/32 | 18.52 |
| EfficientNet-B0 | 16.57 |
| EfficientNet-B7 | 23.20 |
| EfficientNet-L2-Ns | 39.00 |
| CLIP-ViT-L/14@224px | 1.86 |
| CLIP-ViT-L/14@336px | 2.02 |
| OpenCLIP-ViT-bigG-14 | 15.93 |
| OpenCLIP-ViT-L-14 | 15.60 |
Evaluation Code
- CLIP 👁 Open In Colab
- OpenCLIP
- Other models 👁 Open In Colab
Supported Tasks
image-classification: The objective of this task is to classify an image into one or more classes, selected from 1000 ImageNet categories (allowing for multiple ground-truth labels per image).
Languages
The english_label field in the dataset are in English.
Dataset Structure
Data Instances
An example looks like this:
{
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=575x409 at 0x7F09456B53A0>,
'label': [0],
'origin': 'imagenet_sketch',
'english_label': ['tench']
}
Data Fields
The data instances have the following fields:
- image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
- label: A List[int] collection containing the ground-truth ids.
- origin: A string containing source dataset.
- english_label: A List[str] collection containg the english labels for the ground-truth classes.
Data Splits
This dataset is a validation-only set.
Dataset Creation
Source Data
This dataset is sourced from ImageNet, ImageNet-ReaL, ImageNet-V2, ImageNet-A, ImageNet-C, ImageNet-R, ImageNet-Sketch, and ObjectNet.
Citation Information
@article{taesiri2023zoom,
title={ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification},
author={Taesiri, Mohammad Reza and Nguyen, Giang and Habchi, Sarra and Bezemer, Cor-Paul and Nguyen, Anh},
journal={arXiv preprint arXiv:2304.05538},
year={2023}
}
- Downloads last month
- 203
