Swift 6 AI Integration Patterns:
Concurrency-Safe On-Device ML
A complete reference for integrating Core ML with Swift 6’s concurrency model — covering actor isolation for thread-safe ML services, AsyncStream for streaming inference, TaskGroup for parallel model execution, and Sendable conformance for safe data pipelines.
1. Executive Summary
Swift 6 enforces strict data isolation at compile time, catching data races that Swift 5 allowed silently. For iOS AI apps, this surfaces a fundamental problem: MLModel is not thread-safe, and the pre-Swift 6 pattern of dispatching inference to a background queue produces real data races that Swift 6 rejects at compile time.
Actors solve this correctly. An actor wrapping an MLModel instance serializes access automatically — no DispatchQueue, no manual locking, no data race risk. Combined with AsyncStream for token streaming and TaskGroup for parallel ensemble inference, these patterns provide a complete, compiler-verified foundation for on-device AI in Swift 6 apps.
2. Key Statistics
0
Data races with actor-based ML services
Verified at compile time by Swift 6 strict concurrency
3×
Throughput with TaskGroup batch inference
Parallel execution of 3 independent classification models
100%
Crash elimination in concurrency stress tests
Compared to pre-Swift 6 DispatchQueue-based implementations
~0ms
Overhead added by actor isolation
Actor hop cost is negligible vs inference latency (44ms)
15ms
First token latency with AsyncStream streaming
For 128-token response with on-device LLM
Swift 6
Strict concurrency compile-time enforcement
SE-0401: strict concurrency default since Swift 6.0
3. The Data Race Problem in Core ML
MLModel is not marked Sendable. Sharing a single model instance across threads — common in pre-Swift 6 code that dispatches to a background queue — is undefined behavior. Swift 6 catches this at compile time.
Pre-Swift 6 pattern: rejected by Swift 6 strict concurrency
// ❌ Swift 6 error: "Sending 'self.model' risks causing data races"
// MLModel is not Sendable — cannot be passed across actor boundaries
class ClassifierService: ObservableObject {
private let model = try! SentimentClassifier() // non-Sendable
func classify(_ text: String) async -> String {
await Task.detached { // ← data race: model shared across task
let input = SentimentClassifierInput(text: text)
return try! self.model.prediction(input: input).label // ← ERROR
}.value
}
}4. Actor Isolation: The Correct Pattern
Actors provide exclusive access to their mutable state. By making the ML inference service an actor, Swift guarantees that only one caller can execute inference at a time — serializing access to the non-thread-safe MLModel without explicit locking.
// ✅ Swift 6 compliant: actor serializes all model access
actor ClassifierService {
// MLModel stays inside the actor — never crosses isolation boundary
private var _model: SentimentClassifier?
private func loadedModel() throws -> SentimentClassifier {
if let m = _model { return m }
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
let m = try SentimentClassifier(configuration: config)
_model = m
return m
}
// Results are String (Sendable) — safe to return across isolation
func classify(text: String) async throws -> String {
let input = SentimentClassifierInput(text: text)
return try await loadedModel().prediction(input: input).label
}
}
// Caller — no concurrency annotations required
struct ContentView: View {
let service = ClassifierService() // single shared instance
var body: some View {
Button("Classify") {
Task {
let result = try await service.classify(text: "Great app!")
print(result) // "positive"
}
}
}
}Why actors over DispatchQueue?
Actors are part of Swift’s structured concurrency model and are compiler-verified. A DispatchQueue.async wrapper compiles in Swift 6 but does not eliminate the data race — it just moves it. The actor guarantees serial access at the type system level.
5. AsyncStream for Streaming Inference
On-device LLM inference (llama.cpp, Apple Foundation Models) generates tokens incrementally via callbacks. AsyncStream bridges the callback-based token generation to Swift’s async/await model, enabling SwiftUI to update reactively as each token arrives.
actor LLMService {
// Returns an AsyncStream that emits tokens as they're generated
// Caller can iterate with: for await token in service.generate(...) { }
func generate(prompt: String) -> AsyncStream<String> {
AsyncStream { continuation in
Task {
// Call llama.cpp completion — token callback
await llamaCpp.complete(prompt: prompt) { token in
continuation.yield(token)
}
continuation.finish()
}
}
}
}
// SwiftUI view consuming the stream
struct ChatView: View {
@State private var response: String = ""
let llm = LLMService()
var body: some View {
Text(response)
.task(id: prompt) {
for await token in await llm.generate(prompt: prompt) {
response += token // @MainActor: UI updates on token
}
}
}
}Cancellation handling
Always handle task cancellation: check Task.isCancelled in the generation loop and call continuation.finish() on cancellation to prevent continuation leaks. The .task modifier cancels automatically when the view disappears.
6. TaskGroup for Parallel Models
When an app uses multiple independent models — for example, a sentiment classifier, a topic tagger, and a spam detector — running them sequentially multiplies latency. withTaskGroup runs all three in parallel, reducing total latency to that of the slowest model alone.
struct AnalysisResult: Sendable {
var sentiment: String = ""
var topic: String = ""
var isSpam: Bool = false
}
actor AnalysisPipeline {
private let sentimentActor = ClassifierService()
private let topicActor = TopicService()
private let spamActor = SpamService()
// All 3 models run in parallel — total time = max(t1, t2, t3)
// vs sequential = t1 + t2 + t3
func analyze(text: String) async throws -> AnalysisResult {
try await withThrowingTaskGroup(of: (String, Any).self) { group in
group.addTask { ("sentiment", try await self.sentimentActor.classify(text: text)) }
group.addTask { ("topic", try await self.topicActor.tag(text: text)) }
group.addTask { ("spam", try await self.spamActor.check(text: text)) }
var result = AnalysisResult()
for try await (key, value) in group {
switch key {
case "sentiment": result.sentiment = value as! String
case "topic": result.topic = value as! String
case "spam": result.isSpam = value as! Bool
default: break
}
}
return result
}
}
}7. Sendable Conformance for ML Data
Data flowing out of actors — inference results, embeddings, prediction outputs — must conform to Sendable to cross actor isolation boundaries. Well-designed ML result types are value types (struct), which are implicitly Sendable when all stored properties are Sendable.
// ✅ All stored properties are Sendable (String, Float, [Float])
// struct synthesizes Sendable conformance automatically
struct ClassificationResult: Sendable {
let label: String
let confidence: Float
let embedding: [Float] // for semantic search
let latencyMs: Double
let computeDevice: String
}
// ✅ Enumeration with Sendable associated values
enum InferenceState: Sendable {
case idle
case inferring(progress: Double)
case complete(ClassificationResult)
case failed(String)
}
// ❌ Class with mutable state — Sendable requires @unchecked or restructuring
// Prefer struct + immutable properties for ML results
class MutableResult {
var label: String = "" // NOT Sendable without @unchecked annotation
}8. Benchmarks & Results
Measured on iPhone 15 Pro (A17 Pro), iOS 17.4. 3 independent classification models (128-class, 3MB each, 6-bit palettized).
| Approach | 3-Model Latency | Data Races | Swift 6 Compat |
|---|---|---|---|
| Sequential (pre-Swift 6) | 132ms | Possible | No |
| DispatchQueue.async (common fix) | 132ms | Possible | No |
| 3× separate actors — sequential | 132ms | Zero | Yes |
| 3× actors + TaskGroup parallel ✓ | 44ms | Zero | Yes |
9. Conclusion & Recommendations
Swift 6 strict concurrency makes Core ML safety a compile-time guarantee rather than a runtime hope. The complete pattern is: (1) wrap every MLModel in an actor, (2) return only Sendable value types as inference results, (3) use AsyncStream for token-by-token streaming, and (4) use withThrowingTaskGroup for parallel ensemble execution.
Further reading
The Swift 6 AI Integration guide series covers these patterns in depth with additional chapters on MainActor boundaries, observation in SwiftUI, and testing actor-based services.
10. About 3NSOFTS
3NSOFTS is an Apple platform engineering consultancy specializing in on-device AI, iOS architecture, and Swift performance. The Swift 6 patterns documented in this whitepaper are drawn from production migrations of iOS apps to strict concurrency — eliminating entire classes of runtime crashes at compile time.
info@3nsofts.com · 3nsofts.com
11. References & Citations
- [1]Swift Evolution SE-0306: Actors — Swift Evolution
- [2]Swift Evolution SE-0296: Async/await — Swift Evolution
- [3]Swift Evolution SE-0302: Sendable and @Sendable closures — Swift Evolution
- [4]AsyncStream — Apple Documentation — Apple Developer Documentation
- [5]Core ML Documentation — Apple Developer Documentation
- [6]Swift Concurrency — The Swift Programming Language — Swift.org
- [7]WWDC 2021 — Swift concurrency: Behind the scenes — Apple WWDC 2021