VOOZH about

URL: https://dev.to/john-rocky/a-swift-library-to-run-segment-anything-natively-on-ios-samkit-5eho

⇱ A Swift library to run Segment Anything natively on iOS (SamKit) - DEV Community


πŸ‘ SAMKit Demo

For a while I'd wanted to build a Swift Package that runs Meta's Segment Anything Model (SAM) on-device on iOS.

  • Cut out the object you tap
  • Cut out the object you box in
  • Cut out the object you specify by text

Any of these segments instantly, with all inference completing on-device. It even comes with ready-to-use UI components.

So I built it.

GitHub: https://github.com/john-rocky/SamKit

What it can do

Feature Description
Point & Box Tap for a point, drag for a box, then segment
Text Prompt Type text like "dog" or "red cup" to detect and segment
Subject Lift Long-press to lift an object out, Apple Photos–style; copy/save/share
Two backbones MobileSAM (fast, 23MB) and SAM2 Tiny (accurate, 76MB)
Drop-in UI Just embed the SwiftUI views as-is

Architecture

SAMKit/
β”œβ”€β”€ SAMKit # core inference engine (point/box)
β”œβ”€β”€ SAMKitGrounding # text detection (YOLO-World + CLIP)
└── SAMKitUI # SwiftUI views (SamView / TextPromptView)

Split into three Swift Package products. Import only what you need.

Setup

1. Add the Swift Package

dependencies: [
 .package(url: "https://github.com/john-rocky/SamKit.git", from: "1.0.0")
]

2. Download the models

Get the .mlpackage files from Releases and add them to your Xcode project.

Model Size Use
MobileSAM 23 MB point/box segmentation (required)
SAM2 Tiny 76 MB higher-accuracy segmentation (optional)
Grounding (YOLO-World + CLIP) 148 MB text detection (optional)

Usage

Point/box segmentation

The most basic use. Set an image and specify a point.

import SAMKit

// create a session (the model auto-loads from a bundled resource)
let session = try SamSession(
 model: .bundled(.mobileSam),
 config: .bestAvailable // priority: Neural Engine > GPU > CPU
)

// encode the image (once; later predicts use the cache)
try session.setImage(cgImage)

// segment by point
let result = try session.predict(
 points: [SamPoint(x: 100, y: 200, label: .positive)]
)

// results
let mask = result.masks.first!
mask.cgImage // segmentation mask image
mask.score // IoU confidence score
mask.alpha // alpha-channel data

You can also specify negative points (regions to exclude) and a bounding box:

let result = try session.predict(
 points: [
 SamPoint(x: 100, y: 200, label: .positive), // point to include
 SamPoint(x: 300, y: 400, label: .negative) // point to exclude
 ],
 box: SamBox(x0: 50, y0: 50, x1: 400, y1: 400) // bounding box
)

Segment by text prompt

Combine SAM with text detection by YOLO-World + CLIP.

import SAMKit
import SAMKitGrounding

let session = try TextSegmentationSession(
 groundingModel: .bundled(),
 samModel: .bundled(.mobileSam)
)

try session.setImage(cgImage)

// search by text and segment
let result = try session.segment(query: "dog, cat")

result.detections // detections (bounding box + label)
result.masks // segmentation mask for each detection
result.scores // confidence scores

Cutting out the object

You can generate a transparent PNG from the segmentation result.

// cut out from a single mask
let extracted = result.masks[0].extractObject(from: cgImage)
// β†’ a CGImage with a transparent background

// composite cut-out from multiple masks
let combined = SamMask.extractObject(from: cgImage, masks: result.masks)

Embedding the SwiftUI views

You don't need to build the UI yourself. SAMKitUI includes ready-to-use views.

import SAMKitUI

// interactive segmentation by point/box
SamView(image: uiImage, model: try .bundled(.mobileSam))

// segmentation by text search
TextPromptView(image: uiImage, session: textSession)

These views include:

  • subject highlight after segmentation (dim background + subject at full brightness)
  • an animated glowing outline
  • long-press to lift the object β†’ drag β†’ Copy/Save/Share menu

How Subject Lift is implemented

A technical walkthrough of recreating Apple Photos' "lift the subject" feature.

1. Binarizing the mask

SAM's mask output is continuous sigmoid values, so convert it to a clean binary mask for display.

func binarizeMask(_ maskImage: CGImage) -> CGImage? {
 // get pixel data via CGContext
 let ctx = CGContext(data: nil, width: width, height: height, ...)
 ctx.draw(maskImage, in: rect)

 let pixels = ctx.data!.bindMemory(to: UInt8.self, capacity: width * height * 4)
 let threshold: UInt8 = 128 // 50% β€” SAM's standard threshold

 for i in 0..<(width * height) {
 let o = i * 4
 if pixels[o + 3] >= threshold {
 // fully opaque white
 pixels[o] = 255; pixels[o+1] = 255; pixels[o+2] = 255; pixels[o+3] = 255
 } else {
 // fully transparent
 pixels[o] = 0; pixels[o+1] = 0; pixels[o+2] = 0; pixels[o+3] = 0
 }
 }
 return ctx.makeImage()
}

At threshold 0 it picks up mask noise and cuts out most of the image. 128 (50%) is stable.

2. Generating the glowing outline

Extract the mask's contour with CGContext's shadow feature. Far faster than per-pixel dilation.

func generateOutline(from maskImage: CGImage) -> CGImage? {
 // Step 1: turn the mask into a solid-white silhouette
 ctx.draw(maskImage, in: rect)
 ctx.setBlendMode(.sourceIn)
 ctx.setFillColor(UIColor.white.cgColor)
 ctx.fill(rect) // β†’ white silhouette

 // Step 2: draw with a shadow, then erase the interior β†’ only the contour remains
 outCtx.setShadow(offset: .zero, blur: glowRadius, color: UIColor.white.cgColor)
 outCtx.draw(whiteSilhouette, in: rect) // shadow = the contour's glow

 outCtx.setBlendMode(.destinationOut)
 outCtx.draw(whiteSilhouette, in: rect) // erase the interior β†’ contour only
}

Key points:

  • setShadow makes the white glow (only two draws)
  • .destinationOut erases the interior, leaving only the outer glow
  • Far faster than a dilation loop (O(thicknessΒ² Γ— pixels))

3. Shimmer animation

Use TimelineView and AngularGradient to make light travel around the contour.

TimelineView(.animation(minimumInterval: 1.0 / 30)) { timeline in
 let phase = timeline.date.timeIntervalSinceReferenceDate
 .truncatingRemainder(dividingBy: 2.5) / 2.5 // one lap in 2.5s

 ZStack {
 // soft glow (blurred cyan)
 outlineImage.colorMultiply(Color(red: 0.5, green: 0.85, blue: 1.0))
 .blur(radius: 5).opacity(0.8)

 // sharp outline
 outlineImage.colorMultiply(.white)

 // moving highlight
 outlineImage.colorMultiply(.white)
 .mask(
 AngularGradient(
 colors: [.white, .white.opacity(0.5), .clear, .clear, ...],
 center: .center,
 startAngle: .degrees(phase * 360),
 endAngle: .degrees(phase * 360 + 360)
 )
 )
 }
}

4. Unified gesture handler

Manage tap (add point), box drawing, and long-press lift all with a single DragGesture(minimumDistance: 0).

SwiftUI's onTapGesture + onLongPressGesture block each other, so I receive all touches in one gesture and classify them by time and movement.

DragGesture(minimumDistance: 0)
 .onChanged { value in
 // schedule a timer on the first touch
 if gestureStartTime == nil {
 gestureStartTime = Date()
 // decide long-press after 0.3s
 DispatchQueue.main.asyncAfter(deadline: .now() + 0.3) {
 guard gestureStartTime != nil, !isLifted, hasVisibleMasks else { return }
 let moved = hypot(lastTranslation.width, lastTranslation.height)
 guard moved < 15 else { return } // if it moved, it's not a long-press
 handleLiftObject() // start the lift!
 }
 }

 if isLifted {
 liftDragOffset = value.translation // follow the drag
 }
 }
 .onEnded { value in
 if isLifted {
 // released β†’ show menu
 showLiftMenu = true
 } else if elapsed < 0.3 && moved < 15 {
 // quick touch β†’ add point
 addPoint(at: value.startLocation)
 }
 }

Classification logic:

Condition Verdict
< 0.3s, < 15pt moved tap β†’ add point
β‰₯ 0.3s, < 15pt moved long-press β†’ start lift
β‰₯ 10pt moved (box mode) drag β†’ draw box
movement while lifted lift-drag β†’ move object

5. Subject highlight

Rather than overlaying a colored mask after segmentation, dim the background and show only the subject at its original brightness.

// darken the background
Color.black.opacity(0.25)

// show only the subject at the original image's brightness
Image(uiImage: image)
 .mask(Image(uiImage: UIImage(cgImage: binaryMask)))

This makes the transition to long-press lift natural (the dimming just deepens 0.25 β†’ 0.4).

Performance

  • Image encoding: once per image; later predicts reuse the cache
  • Inference: accelerated on Neural Engine / GPU (FP16)
  • Outline generation: only two CGContext-shadow draws; no pixel loop
  • Networking: none. Fully on-device

Summary

With SAMKit you can add segmentation to an iOS app in a few lines.

// an interactive segmentation UI in one line
SamView(image: uiImage, model: try .bundled(.mobileSam))

Experiences like Subject Lift are built in too, so you can bring an Apple Photos–like UX into your own app immediately.

GitHub: https://github.com/john-rocky/SamKit

Feedback and issues welcome!


Originally published in Japanese on Qiita. GitHub / X