IMAGENETTE

IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/pdf/1409.0575

Classification Report:
 precision recall f1-score support

 tench 0.9885 0.9834 0.9859 963
english springer 0.9843 0.9822 0.9832 955
 cassette player 0.9544 0.9486 0.9515 993
 chain saw 0.9257 0.8998 0.9125 858
 church 0.9654 0.9798 0.9726 941
 French horn 0.9757 0.9665 0.9711 956
 garbage truck 0.8883 0.9761 0.9301 961
 gas pump 0.9366 0.9044 0.9202 931
 golf ball 0.9925 0.9716 0.9819 951
 parachute 0.9821 0.9708 0.9764 960

 accuracy 0.9590 9469
 macro avg 0.9593 0.9583 0.9586 9469
 weighted avg 0.9597 0.9590 0.9591 9469

👁 download.png

Label Space: 10 Classes

The model predicts one of the following image classes:

0: tench
1: english springer
2: cassette player
3: chain saw
4: church
5: French horn
6: garbage truck
7: gas pump
8: golf ball
9: parachute

Install Dependencies

pip install -q transformers torch pillow gradio hf_xet

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/IMAGENETTE"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Label mapping
id2label = {
 "0": "tench",
 "1": "english springer",
 "2": "cassette player",
 "3": "chain saw",
 "4": "church",
 "5": "French horn",
 "6": "garbage truck",
 "7": "gas pump",
 "8": "golf ball",
 "9": "parachute"
}

def classify_image(image):
 image = Image.fromarray(image).convert("RGB")
 inputs = processor(images=image, return_tensors="pt")
 
 with torch.no_grad():
 outputs = model(**inputs)
 logits = outputs.logits
 probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
 
 prediction = {
 id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
 }

 return prediction

# Gradio Interface
iface = gr.Interface(
 fn=classify_image,
 inputs=gr.Image(type="numpy"),
 outputs=gr.Label(num_top_classes=3, label="Image Classification"),
 title="IMAGENETTE - SigLIP2 Classifier",
 description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
)

if __name__ == "__main__":
 iface.launch()

Intended Use

IMAGENETTE is designed for:

Educational purposes and model benchmarking.
Demonstrating the performance of SigLIP2 on a small but diverse classification task.
Fine-tuning workflows on vision-language models.

Downloads last month: 9

Safetensors

Model size

92.9M params

Tensor type

F32

Model tree for prithivMLmods/IMAGENETTE

Base model

google/siglip2-base-patch16-224

Finetuned

(119)

this model

Dataset used to train prithivMLmods/IMAGENETTE

Collection including prithivMLmods/IMAGENETTE

classification net. • 20 items • Updated 1 day ago • 2

Papers for prithivMLmods/IMAGENETTE

Paper • 2502.14786 • Published Feb 20, 2025 • 166

Paper • 1409.0575 • Published Sep 1, 2014 • 10

URL: https://huggingface.co/prithivMLmods/IMAGENETTE

⇱ prithivMLmods/IMAGENETTE · Hugging Face

IMAGENETTE

Label Space: 10 Classes

Install Dependencies

Inference Code

Intended Use

Model tree for prithivMLmods/IMAGENETTE

Dataset used to train prithivMLmods/IMAGENETTE

Collection including prithivMLmods/IMAGENETTE

Papers for prithivMLmods/IMAGENETTE