VOOZH about

URL: https://huggingface.co/prithivMLmods/RESISC45-SigLIP2

⇱ prithivMLmods/RESISC45-SigLIP2 · Hugging Face


👁 1.png

RESISC45-SigLIP2

RESISC45-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is specifically trained to recognize and tag multiple land use and land cover scene categories from the RESISC45 dataset using the SiglipForImageClassification architecture.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

Classification Report:
 precision recall f1-score support

 airplane 0.9830 0.9900 0.9865 700
 airport 0.9461 0.9529 0.9495 700
 baseball diamond 0.9802 0.9886 0.9844 700
 basketball court 0.9516 0.9271 0.9392 700
 beach 0.9914 0.9900 0.9907 700
 bridge 0.9730 0.9771 0.9751 700
 chaparral 0.9957 0.9986 0.9971 700
 church 0.7949 0.8971 0.8430 700
 circular farmland 0.9914 0.9914 0.9914 700
 cloud 0.9957 0.9871 0.9914 700
 commercial area 0.9231 0.8229 0.8701 700
 dense residential 0.9355 0.8914 0.9129 700
 desert 0.9821 0.9414 0.9613 700
 forest 0.9652 0.9514 0.9583 700
 freeway 0.9344 0.9571 0.9457 700
 golf course 0.9759 0.9843 0.9801 700
 ground track field 0.9623 0.9857 0.9739 700
 harbor 0.9885 0.9843 0.9864 700
 industrial area 0.9505 0.9043 0.9268 700
 intersection 0.9855 0.9686 0.9769 700
 island 0.9871 0.9829 0.9850 700
 lake 0.9440 0.9629 0.9533 700
 meadow 0.9564 0.9400 0.9481 700
 medium residential 0.8602 0.9314 0.8944 700
 mobile home park 0.9610 0.9500 0.9555 700
 mountain 0.9388 0.9429 0.9408 700
 overpass 0.9614 0.9614 0.9614 700
 palace 0.8455 0.8286 0.8369 700
 parking lot 0.9899 0.9757 0.9827 700
 railway 0.9407 0.9071 0.9236 700
 railway station 0.9104 0.9143 0.9123 700
 rectangular farmland 0.9572 0.9271 0.9419 700
 river 0.9281 0.9586 0.9431 700
 roundabout 0.9914 0.9871 0.9893 700
 runway 0.9669 0.9586 0.9627 700
 sea ice 0.9957 0.9943 0.9950 700
 ship 0.9558 0.9886 0.9719 700
 snowberg 0.9886 0.9900 0.9893 700
 sparse residential 0.9238 0.9700 0.9463 700
 stadium 0.9716 0.9757 0.9736 700
 storage tank 0.9787 0.9829 0.9808 700
 tennis court 0.9326 0.9486 0.9405 700
 terrace 0.9372 0.9586 0.9477 700
thermal power station 0.9482 0.9671 0.9576 700
 wetland 0.9444 0.8986 0.9209 700

 accuracy 0.9532 31500
 macro avg 0.9538 0.9532 0.9532 31500
 weighted avg 0.9538 0.9532 0.9532 31500

Label Space: 45 Scene Categories

The model predicts the presence of one or more of the following 45 scene categories:

Class 0: "airplane"
Class 1: "airport"
Class 2: "baseball diamond"
Class 3: "basketball court"
Class 4: "beach"
Class 5: "bridge"
Class 6: "chaparral"
Class 7: "church"
Class 8: "circular farmland"
Class 9: "cloud"
Class 10: "commercial area"
Class 11: "dense residential"
Class 12: "desert"
Class 13: "forest"
Class 14: "freeway"
Class 15: "golf course"
Class 16: "ground track field"
Class 17: "harbor"
Class 18: "industrial area"
Class 19: "intersection"
Class 20: "island"
Class 21: "lake"
Class 22: "meadow"
Class 23: "medium residential"
Class 24: "mobile home park"
Class 25: "mountain"
Class 26: "overpass"
Class 27: "palace"
Class 28: "parking lot"
Class 29: "railway"
Class 30: "railway station"
Class 31: "rectangular farmland"
Class 32: "river"
Class 33: "roundabout"
Class 34: "runway"
Class 35: "sea ice"
Class 36: "ship"
Class 37: "snowberg"
Class 38: "sparse residential"
Class 39: "stadium"
Class 40: "storage tank"
Class 41: "tennis court"
Class 42: "terrace"
Class 43: "thermal power station"
Class 44: "wetland"

Install dependencies

pip install -q transformers torch pillow gradio

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/RESISC45-SigLIP2" # Update to your actual Hugging Face model path
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Label map
id2label = {
 "0": "airplane", "1": "airport", "2": "baseball diamond", "3": "basketball court", "4": "beach",
 "5": "bridge", "6": "chaparral", "7": "church", "8": "circular farmland", "9": "cloud",
 "10": "commercial area", "11": "dense residential", "12": "desert", "13": "forest", "14": "freeway",
 "15": "golf course", "16": "ground track field", "17": "harbor", "18": "industrial area", "19": "intersection",
 "20": "island", "21": "lake", "22": "meadow", "23": "medium residential", "24": "mobile home park",
 "25": "mountain", "26": "overpass", "27": "palace", "28": "parking lot", "29": "railway",
 "30": "railway station", "31": "rectangular farmland", "32": "river", "33": "roundabout", "34": "runway",
 "35": "sea ice", "36": "ship", "37": "snowberg", "38": "sparse residential", "39": "stadium",
 "40": "storage tank", "41": "tennis court", "42": "terrace", "43": "thermal power station", "44": "wetland"
}

def classify_resisc_image(image):
 image = Image.fromarray(image).convert("RGB")
 inputs = processor(images=image, return_tensors="pt")
 
 with torch.no_grad():
 outputs = model(**inputs)
 logits = outputs.logits
 probs = torch.sigmoid(logits).squeeze().tolist()
 
 threshold = 0.5
 predictions = {
 id2label[str(i)]: round(probs[i], 3)
 for i in range(len(probs)) if probs[i] >= threshold
 }

 return predictions or {"None Detected": 0.0}

# Gradio Interface
iface = gr.Interface(
 fn=classify_resisc_image,
 inputs=gr.Image(type="numpy"),
 outputs=gr.Label(label="Predicted Scene Categories"),
 title="RESISC45-SigLIP2",
 description="Upload a satellite image to detect multiple land use and land cover categories (e.g., airport, forest, mountain)."
)

if __name__ == "__main__":
 iface.launch()

Intended Use

The RESISC45-SigLIP2 model is ideal for multi-label classification tasks involving remote sensing imagery. Use cases include:

  • Remote Sensing Analysis – Label elements in aerial/satellite images.
  • Urban Planning – Identify urban structures and landscape features.
  • Geospatial Intelligence – Aid in automated image interpretation pipelines.
  • Environmental Monitoring – Track natural landforms and changes.
Downloads last month
50
Safetensors
Model size
92.9M params
Tensor type
F32
·

Model tree for prithivMLmods/RESISC45-SigLIP2

Finetuned
(119)
this model

Dataset used to train prithivMLmods/RESISC45-SigLIP2

Collections including prithivMLmods/RESISC45-SigLIP2

Paper for prithivMLmods/RESISC45-SigLIP2