VOOZH about

URL: https://huggingface.co/Pranilllllll/segformer-satellite-segementation

⇱ Pranilllllll/segformer-satellite-segementation · Hugging Face


Model Card — SegFormer-B0 Kathmandu Valley Satellite Segmentation

Model Description

This model is a fine-tuned SegFormer-B0 for semantic segmentation of satellite imagery over Kathmandu Valley, Nepal. It classifies each pixel into one of 7 land-use categories: Background, Residential Area, Road, River, Forest, and Unused Land. The model is intended for urban planning, GIS analysis, and geospatial research applications.

  • Developed by: praniil
  • Model type: Semantic Segmentation (SegFormer-B0)
  • Language(s) (NLP): N/A (Computer Vision)
  • License: MIT
  • Finetuned from model: nvidia/mit-b0 (SegFormer-B0 pretrained on ImageNet)

Model Sources


Uses

Direct Use

This model can be used out-of-the-box for satellite image segmentation over Kathmandu Valley or similar urban/semi-urban landscapes. It accepts a 512×512 RGB satellite image and outputs a per-pixel land-use classification mask.

import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from transformers import SegformerForSemanticSegmentation, SegformerFeatureExtractor

device = "cuda" if torch.cuda.is_available() else "cpu"

HF_REPO = "Pranilllllll/segformer-satellite-segementation"
model = SegformerForSemanticSegmentation.from_pretrained(HF_REPO).to(device)
processor = SegformerFeatureExtractor.from_pretrained(HF_REPO)
model.eval()

image = Image.open("path_to_your_satellite_image.png").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs["pixel_values"].to(device)

with torch.no_grad():
 outputs = model(pixel_values=pixel_values)
 logits = outputs.logits # [1, 7, H, W]

pred_mask = torch.argmax(logits, dim=1).squeeze().cpu().numpy()

colors = np.array([
 [0, 0, 0], # Background
 [128, 0, 0], # Residential Area
 [0, 128, 0], # Road
 [0, 0, 128], # River
 [0, 128, 128], # Forest
 [128, 128, 0], # Unused Land
 [128, 0, 128], # (reserved)
], dtype=np.uint8)

seg_image = colors[pred_mask]
plt.imsave("prediction.png", seg_image)
print("Inference complete. Prediction saved as prediction.png")

Downstream Use

This model can be plugged into larger GIS pipelines for:

  • Automated land-use/land-cover (LULC) mapping
  • Urban sprawl analysis
  • River and forest change detection
  • Input feature generation for spatial planning models

Out-of-Scope Use

  • Not suitable for segmenting non-satellite imagery (street photos, drone footage with different resolution/angle).
  • Performance may degrade on satellite imagery from regions with significantly different land cover patterns than Kathmandu Valley.
  • Not suitable for fine-grained object detection within classes (e.g., identifying individual buildings).

Bias, Risks, and Limitations

  • Geographic bias: Trained exclusively on Kathmandu Valley tiles; may not generalize to other geographies.
  • Class imbalance: Despite weighted loss, rare classes (Road, River) may have lower per-class IoU.
  • Resolution dependency: Expects 512×512 input tiles; other resolutions require resizing and may affect accuracy.
  • Annotation noise: Manual annotations via CVAT may have some boundary ambiguity between classes.

Recommendations

Validate predictions on your specific region before using results for critical planning decisions. Cross-checking against GIS datasets (e.g., OpenStreetMap) is recommended.


How to Get Started with the Model

Install dependencies:

pip install torch transformers Pillow matplotlib

Then use the inference script in the Direct Use section above.


Training Details

Training Data

A custom dataset was built from satellite imagery of Kathmandu Valley, Nepal, divided into a grid of tiles.

  • Total images: ~400 tiles
  • Resolution: 512 × 512 pixels
  • Annotation tool: CVAT
  • Task: Multi-class semantic segmentation

Annotation Classes

Class ID Class Name RGB Color
0 Background (0, 0, 0)
1 Residential Area (128, 0, 0)
2 Road (0, 128, 0)
3 River (0, 0, 128)
4 Forest (0, 128, 128)
5 Unused Land (128, 128, 0)

Training Procedure

Preprocessing

  • Images resized to 512 × 512
  • Standard ImageNet normalization via SegformerFeatureExtractor

Data Augmentation

Applied using albumentations:

  • Horizontal and vertical flips
  • Random 90-degree rotations
  • Resize to 512 × 512

Training Hyperparameters

Hyperparameter Value
Input size 512 × 512
Batch size 16
Optimizer AdamW
Learning rate 3e-5
Loss function Weighted Cross-Entropy
Epochs 300 (early stopping, patience=25)
Cross-validation 3-fold
Training regime bf16 mixed precision

Class Imbalance Handling

Inverse frequency class weights were computed from the training set and applied to the cross-entropy loss, ensuring rare classes (Road, River) contribute proportionally during training.


Evaluation

Metrics

  • Mean IoU (mIoU): Primary metric — overlap between predicted and ground truth masks averaged across all classes.
  • Per-class IoU: Segmentation accuracy per land-use category.
  • Qualitative inspection: Visual comparison of predicted vs. ground truth masks.

Results

Cross-validation results are reported as mean ± standard deviation of mIoU across 3 folds. Training curves (loss, mIoU, gradient norm) are available in the eval_plots/ directory.

The stable gradient norm across training confirms the MiT encoder converged effectively without vanishing gradient issues.


Model Architecture

  • Backbone: SegFormer-B0 (nvidia/mit-b0)
  • Encoder: MiT (Mix Transformer) — hierarchical global context without positional encoding
  • Decoder: Lightweight MLP head — per-pixel class probability predictions
  • Output: 7-class segmentation mask over a 512×512 spatial grid

Environmental Impact

  • Hardware Type: CUDA-enabled GPU
  • Cloud Provider: Not applicable (local training)
  • Compute Region: Nepal
  • Carbon Emitted: Not measured

Citation

@misc{praniil2024kathmandu-segmentation,
 author = {praniil},
 title = {Kathmandu Valley Satellite Image Segmentation with SegFormer-B0},
 year = {2024},
 publisher = {GitHub},
 howpublished = {\url{https://github.com/praniil/satellite-image-segmentation}},
}

Model Card Authors

praniil

Model Card Contact

Open an issue at https://github.com/praniil/satellite-image-segmentation/issues

Downloads last month
313
Safetensors
Model size
3.72M params
Tensor type
F32
·