On-Device AI

SwiftUI + Core ML Architecture Patterns for Production Apps

The tutorial version calls <code className="text-base bg-slate-100 px-1.5 py-0.5 rounded font-mono text-slate-800">MLModel.prediction()</code> directly from a view. The production version doesn't. Here's the architecture that survives real users, background tasks, and model updates.

By Ehsan Azish · 3NSOFTS·March 2026·8 min read

The problem with the naive approach

Most SwiftUI + Core ML tutorials show the same pattern: add the model to the project, instantiate it in the view, call try? model.prediction() from a button action. This works in a demo. In production it creates four problems:

Thread safety. MLModel is not thread-safe. Calling prediction from multiple threads simultaneously causes crashes.
Main thread blocking. Inference takes 5–100ms. On the main thread, this drops frames and causes UI jank on slower devices.
No testability. View-coupled ML calls can't be unit tested without rendering a full view hierarchy.
Model lifecycle. No control over when the model loads, no graceful degradation if loading fails, no path to swap the model.

The actor-based inference service

The production pattern uses a Swift Actor to own the model instance. Actors provide automatic mutual exclusion via Swift's cooperative thread pool — only one inference runs at a time per actor instance, and no manual locking is required:

actor InferenceService {
 private var model: MyClassifier?

 // Lazy load — first call triggers model initialization
 private func loadModelIfNeeded() throws -> MyClassifier 
 let config = MLModelConfiguration()
 config.computeUnits = .cpuAndNeuralEngine
 let loaded = try MyClassifier(configuration: config)
 self.model = loaded
 return loaded
 }

 func classify(_ input: MLFeatureProvider) async throws -> ClassificationResult {
 let model = try loadModelIfNeeded()
 let output = try model.prediction(input: input)
 return ClassificationResult(from: output)
 }
}

The actor guarantees serial access. The lazy initialization means model loading doesn't happen on app launch. The async function means the caller never blocks.

Connecting the actor to SwiftUI

The view model bridges the actor to the UI. It's annotated with @MainActor so all @Published updates land on the main thread without manual dispatch:

@MainActor
@Observable
class ClassificationViewModel {
 var result: ClassificationResult?
 var isProcessing = false
 var error: Error?

 private let service = InferenceService()

 func classify(image: CGImage) {
 isProcessing = true
 error = nil
 Task {
 do {
 let input = try ImageClassifierInput(image: image)
 result = try await service.classify(input)
 } catch 
 isProcessing = false
 }
 }
}

The SwiftUI view observes this model and never touches the actor directly. This keeps the view pure presentation code — no ML logic, no async handling, just binding to published state.

struct ClassificationView: View {
 @State private var viewModel = ClassificationViewModel()

 var body: some View {
 VStack {
 if viewModel.isProcessing {
 ProgressView("Classifying…")
 } else if let result = viewModel.result {
 ResultView(result: result)
 }
 }
 .onAppear {
 // Optionally: pre-warm the model
 Task 
 }
 }
}

Model lifecycle: loading, warming, and releasing

Lazy loading vs pre-warming

Lazy loading (initialize on first use) minimizes app launch impact. Pre-warming (initialize in a background task shortly after launch) eliminates the latency spike on the user's first interaction. Choose based on how critical first-interaction latency is for your use case.

Memory pressure handling

Respond to UIApplication.didReceiveMemoryWarningNotification by releasing the model instance (set it to nil). The actor pattern makes this straightforward — the model can be reloaded on the next inference call:

actor InferenceService {
 private var model: MyClassifier?

 func releaseModel() {
 model = nil // Released from memory; reloads on next call
 }
}

Model swapping without app updates

For apps that update their ML models remotely (e.g., via a model version endpoint), the actor pattern makes swapping safe. Download the new .mlmodelc package to a local URL, then call the actor's swap method — which sets model = nil and updates the URL. The next prediction transparently uses the new model.

Progressive UI: showing results before completion

For batch inference (classifying multiple items), don't wait for all results before updating the UI. Use AsyncStream or AsyncSequence to stream results as each inference completes:

actor InferenceService {
 func classifyBatch(images: [CGImage]) -> AsyncStream {
 AsyncStream { continuation in
 Task {
 for image in images {
 let result = try? await classify(image: image)
 if let result {
 continuation.yield(result)
 }
 }
 continuation.finish()
 }
 }
 }
}

// In the view model:
func processGallery(images: [CGImage]) {
 Task {
 for await result in service.classifyBatch(images: images) {
 results.append(result) // UI updates incrementally
 }
 }
}

This pattern makes batch inference feel fast because the user sees results appearing in real time rather than waiting for a spinner to finish.

Testing the ML layer

Because the inference service is decoupled from the view, it's testable with standard XCTest:

// Define a protocol for the service
protocol ClassificationServiceProtocol {
 func classify(_ input: MLFeatureProvider) async throws -> ClassificationResult
}

// Real implementation conforms to the protocol
actor InferenceService: ClassificationServiceProtocol 

// Mock for tests
actor MockInferenceService: ClassificationServiceProtocol {
 func classify(_ input: MLFeatureProvider) async throws -> ClassificationResult {
 return ClassificationResult(label: "test", confidence: 0.99)
 }
}

// View model accepts the protocol
class ClassificationViewModel {
 private let service: any ClassificationServiceProtocol
 init(service: some ClassificationServiceProtocol = InferenceService()) 
}

The view model's business logic — error handling, state transitions, result formatting — is now testable without a physical device.

Common questions

How do I call Core ML from SwiftUI without blocking the main thread? Wrap your MLModel in a Swift Actor and call predictions using async/await. The actor guarantees serial access, and async/await ensures the UI thread is never blocked. Your SwiftUI view calls a @MainActor view model, which delegates to the actor for inference.

Should Core ML model loading happen on app launch? No — model loading is expensive (50ms–500ms) and blocks the calling thread. Load models lazily on first use, or pre-warm in a background task after launch completes. Never load a Core ML model synchronously on the main thread.

Is Core ML's MLModel thread-safe? No. MLModel instances are not thread-safe — concurrent calls to prediction() can cause crashes. Use a Swift Actor to serialize access. If you need parallel inference, use a pool of separate model instances.

On-Device AI for Apple Platforms: The Complete Guide Core ML Optimization Techniques AI-Native iOS Architecture SwiftUI vs UIKit in 2026

Architecture that ships

We build AI-native iOS apps with production-ready ML integration — not demos. Talk to us about your project.

Work with us → DevScope case study

Authoritative References

Core MLCore ML documentationCore ML toolsSwiftUIObservation