Fashion-Mnist-SigLIP2

Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.

👁 - visual selection.png

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

Classification Report:
 precision recall f1-score support

T-shirt / top 0.8142 0.9147 0.8615 6000
 Trouser 0.9935 0.9870 0.9902 6000
 Pullover 0.8901 0.8610 0.8753 6000
 Dress 0.9098 0.9300 0.9198 6000
 Coat 0.8636 0.8865 0.8749 6000
 Sandal 0.9857 0.9847 0.9852 6000
 Shirt 0.8076 0.6962 0.7478 6000
 Sneaker 0.9663 0.9695 0.9679 6000
 Bag 0.9779 0.9805 0.9792 6000
 Ankle boot 0.9698 0.9700 0.9699 6000

 accuracy 0.9180 60000
 macro avg 0.9179 0.9180 0.9172 60000
 weighted avg 0.9179 0.9180 0.9172 60000

👁 Untitled.png

The model categorizes images into the following 10 classes:

Class 0: "T-shirt / top"
Class 1: "Trouser"
Class 2: "Pullover"
Class 3: "Dress"
Class 4: "Coat"
Class 5: "Sandal"
Class 6: "Shirt"
Class 7: "Sneaker"
Class 8: "Bag"
Class 9: "Ankle boot"

Run with Transformers🤗

!pip install -q transformers torch pillow gradio

import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/Fashion-Mnist-SigLIP2"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def fashion_mnist_classification(image):
 """Predicts fashion category for an image."""
 image = Image.fromarray(image).convert("RGB")
 inputs = processor(images=image, return_tensors="pt")
 
 with torch.no_grad():
 outputs = model(**inputs)
 logits = outputs.logits
 probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
 
 labels = {
 "0": "T-shirt / top", "1": "Trouser", "2": "Pullover", "3": "Dress", "4": "Coat",
 "5": "Sandal", "6": "Shirt", "7": "Sneaker", "8": "Bag", "9": "Ankle boot"
 }
 predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
 
 return predictions

# Create Gradio interface
iface = gr.Interface(
 fn=fashion_mnist_classification,
 inputs=gr.Image(type="numpy"),
 outputs=gr.Label(label="Prediction Scores"),
 title="Fashion MNIST Classification Labels",
 description="Upload an image to classify it into one of the 10 Fashion-MNIST categories."
)

# Launch the app
if __name__ == "__main__":
 iface.launch()

Intended Use:

The Fashion-Mnist-SigLIP2 model is designed for fashion image classification. It helps categorize clothing and footwear items into predefined Fashion-MNIST classes. Potential use cases include:

Fashion Recognition: Classifying fashion images into common categories like shirts, sneakers, and dresses.
E-commerce Applications: Assisting online retailers in organizing and tagging clothing items for better search and recommendations.
Automated Fashion Sorting: Helping automated inventory management systems classify fashion items.
Educational Purposes: Supporting AI and ML research in vision-based fashion classification models.

Downloads last month: 29

Safetensors

Model size

92.9M params

Tensor type

F32

Model tree for prithivMLmods/Fashion-Mnist-SigLIP2

Base model

google/siglip2-base-patch16-224

Finetuned

(119)

this model

Dataset used to train prithivMLmods/Fashion-Mnist-SigLIP2

Collections including prithivMLmods/Fashion-Mnist-SigLIP2

vit, siglip • 7 items • Updated 1 day ago • 1

Multi-Source Domain Adaptation : Infograph, Quickdraw, Real, Sketch, Fashion • 6 items • Updated 1 day ago • 2

Paper for prithivMLmods/Fashion-Mnist-SigLIP2

Paper • 2502.14786 • Published Feb 20, 2025 • 166

URL: https://huggingface.co/prithivMLmods/Fashion-Mnist-SigLIP2