Voozh

A while ago I wrote a full llama.cpp iOS implementation using Obj-c bridge because I wanted one thing:

image in -> structured JSON out -> no cloud required.

It worked. It was fast enough. It was also a lot of plumbing:

XCFramework builds
ObjC++ bridge
tokenizer/eval/sampling internals
model + projector file choreography
JSON guardrails everywhere

Now, about 6 months later, Apple dropped Foundation Models image analysis in Xcode 27.0 beta, and i can finally call a serious on-device model without keeping that whole engine room by myself.

👁 Image

Analyzing images with multimodal prompting | Apple Developer Documentation

Analyze and extract information from images by combining them with descriptive text prompts.

👁 favicon
developer.apple.com

With Foundation Models, the core API is basically:

import FoundationModels

@Generable
struct ReceiptExtraction: Codable {
 var vendor_name: String
 var transaction_date: String
 var total_amount: Double
 var currency: String
 var category: String
 var line_items: [String]
}

let session = LanguageModelSession(model: .default)

let response = try await session.respond(
 generating: ReceiptExtraction.self,
 options: GenerationOptions(
 sampling: .random(top: 20, seed: 1111),
 temperature: 0.1,
 maximumResponseTokens: 384
 )
) {
 """
 Extract receipt information for bookkeeping.
 Return schema-compliant structured output only.
 Format fields for QuickBooks ingestion.
 """
 Attachment(cgImage, orientation: .right)
}

let result = response.content

Receipt image in → QuickBooks-ready JSON out.

No bridge.
No gguf.
No mmproj.
No custom decode loop.

Before

llama.cpp vendor management
ObjC++ wrappers and thread safety
bespoke schema/prompt failover handling
app startup warmups with model files in bundle

Now

native LanguageModelSession
native Attachment(...) for images
native structured generation with @Generable
native prewarm and model availability checks
native Instruments.app profiling available

And that is exactly where it should have been from day one fiddling with multi-modal inference.

URL: https://dev.to/fosteman/100-years-later-apple-finally-shipped-local-multimodal-in-xcode-27-beta-nmc

⇱ 6 months later: Apple Finally Shipped Local Multimodal in Xcode 27 Beta - DEV Community

Analyzing images with multimodal prompting | Apple Developer Documentation

Before

Now