Image Classification • 5.53M • Updated • 5 • 2
Search is not available for this dataset
image imagewidth (px) 164 3.24k | is_document class label 2
classes |
|---|---|
1yes | |
0no | |
0no | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
0no | |
0no | |
0no | |
1yes | |
0no | |
0no | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
0no | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
1yes | |
0no | |
1yes | |
1yes | |
1yes | |
0no | |
0no | |
0no | |
0no | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
0no | |
1yes | |
0no | |
0no | |
1yes | |
1yes | |
0no | |
1yes | |
0no | |
1yes | |
1yes |
End of preview. Expand in Data Studio
The DocOrNot dataset contains 50% of images that are pictures, and 50% that are documents.
It was built using 8k images from each one of these sources:
- RVL CDIP (Small) - https://www.kaggle.com/datasets/uditamin/rvl-cdip-small - license: https://www.industrydocuments.ucsf.edu/help/copyright/
- Flickr8k - https://www.kaggle.com/datasets/adityajn105/flickr8k - license: https://creativecommons.org/publicdomain/zero/1.0/
It can be used to train a model and classify an image as being a picture or a document.
Source code used to generate this dataset : https://github.com/mozilla/docornot
- Downloads last month
- 23
