iOS AI App Architecture Audit: What We Check and Why
Most iOS apps with AI features fail in production for the same five reasons: model on the wrong thread, no fallback when inference fails, training data assumptions that don't hold on real devices, privacy manifests that don't match what the app actually does, and a data model that wasn't designed for a local-first world. Here is every check in our architecture audit and what each one catches.
An iOS AI architecture audit is not a code review. A code review checks whether the code works. An architecture audit checks whether the code will still work when 10,000 people use it at the same time, on a cold device, with a full iCloud queue, after an iOS update — and whether Apple will actually let it into the App Store.
We have audited over 30 iOS codebases with AI features in the last two years. The same failure modes appear in almost every one. What follows is the complete checklist from our audit process, with the exact failure modes each check prevents.
1. Inference Concurrency
What we check: Is Core ML inference isolated behind a Swift actor or a dedicated serial executor? Is the model loaded asynchronously? Does any inference call block the main thread?
Why it matters: Core ML inference triggers real computation — on the CPU, GPU, or Neural Engine depending on the model and device. Even ANE-routed inference takes finite time. When that time happens on the main thread, UIKit and SwiftUI cannot update the screen. Every frame that should be drawn during inference is dropped.
The fix is straightforward: wrap inference behind a Swift actor.
actor InferenceEngine {
private let model: MyClassifier
init() throws {
self.model = try MyClassifier()
}
func classify(input: MLMultiArray) async throws -> MyClassifierOutput {
try model.prediction(input: input)
}
}
Every call to classify happens off the main actor. SwiftUI continues updating during inference. This is not an optimization — it is a correctness requirement for any app that needs to remain responsive during AI workloads.
Why
In every audit we have conducted, the single highest-leverage change is pulling Core ML inference off the main thread. Apps that do this correctly feel native. Apps that don't feel frozen.
2. Model Loading Strategy
What we check: Where does model loading happen — app launch, first use, or background? How large is the .mlpackage bundle? Is the compiled model cached, or does it recompile on every launch?
Why it matters: Core ML model loading involves two steps: reading the .mlpackage from disk and compiling it for the current device. On iPhone 15 Pro, loading a 100MB model takes approximately 800ms to 1.5 seconds the first time. Compiled models are cached by iOS, so subsequent loads are faster — but the initial compilation happens at first launch or after an update.
Apps that load models synchronously at launch time will fail App Store review if the reviewer's device is slow. Models above 50MB should be loaded lazily (on first inference request) or proactively after launch via a background task.
We flag models over 100MB as requiring a download-on-demand strategy using MLModel.compileModel(at:) with a background URLSession task rather than bundling in the app binary.
3. Fallback and Degraded Mode
What we check: Does the app handle MLModelError gracefully? What happens when inference returns a low-confidence result? Is there a non-AI fallback for core functionality?
Why it matters: Core ML throws in more scenarios than developers expect: the model file is missing or corrupted, the device is under thermal pressure and ANE is throttled, the model requires iOS 17 but the user has iOS 16, the input shape does not match the model spec. Apps that crash or freeze in any of these cases fail in production.
Every inference call needs a do-catch block with meaningful fallback logic. For classification tasks, a minimum confidence threshold (typically 0.65–0.75) prevents the app from acting on ambiguous predictions. For features where AI is additive rather than core, the non-AI path should work independently.
do {
let result = try await inferenceEngine.classify(input: array)
guard result.classLabelProbs[result.classLabel] ?? 0 > 0.7 else {
return .lowConfidence
}
return .prediction(result.classLabel)
} catch {
logger.error("Inference failed: \(error)")
return .fallback
}
4. Privacy Manifest Completeness
What we check: Does PrivacyInfo.xcprivacy declare every required reason API the app accesses? Does it accurately describe data collection, data use, and whether data is linked to user identity?
Why it matters: Apple began enforcing privacy manifest requirements in May 2024. Apps that access required reason APIs without declaring them in PrivacyInfo.xcprivacy are rejected with ITMS-91053. iOS AI apps consistently trigger three categories: file timestamp access (used by Core ML when loading models), user defaults access (common for caching inference results or model version flags), and system boot time (used by some analytics SDKs bundled with AI frameworks).
Our audit runs static analysis against the compiled binary to find every required reason API call, then compares against the declared reasons. Every gap is a potential rejection.
5. Data Model Design for Local-First AI
What we check: Is the Core Data or SwiftData schema designed to store predictions independently from the data that generated them? Is there a mechanism to re-run inference when the model is updated? Are prediction results versioned against the model version that generated them?
Why it matters: AI apps update their models. When the model changes, old predictions may no longer be valid. Apps that store predictions without recording which model version generated them cannot selectively invalidate stale results — they either show outdated AI output indefinitely or re-run inference on everything, which is expensive.
The correct pattern: store predictions with a modelVersion field. On app launch, check the current embedded model version against stored predictions. Stale predictions get flagged for background re-inference.
@Model
class Prediction {
var inputHash: String
var label: String
var confidence: Double
var modelVersion: String // "2.1.0"
var createdAt: Date
}
6. Memory Pressure and Model Lifetime
What we check: Is the ML model a shared singleton or instantiated per request? Is the model released when the app is backgrounded? Does the app observe UIApplication.didReceiveMemoryWarningNotification?
Why it matters: Core ML models hold allocated memory proportional to their size. A 200MB model occupying RAM while the app is backgrounded contributes to jetsam-triggered terminations. Apps terminated by the OS in the background lose user state and create confusing restart experiences.
Models should be wrapped in a manager that releases the model on applicationDidEnterBackground and reloads on applicationWillEnterForeground.
7. Swift 6 Strict Concurrency Compliance
What we check: Does the codebase compile cleanly under SWIFT_STRICT_CONCURRENCY = complete? Are there sendability violations, data races, or unchecked @MainActor assumptions?
Why it matters: Swift 6 strict concurrency is enforced by default in Xcode 16. Codebases not compliant with strict concurrency will accumulate warnings that become errors as warnings-as-errors policies tighten. AI workloads almost always involve passing non-Sendable types (like MLMultiArray) across actor boundaries, which requires deliberate design.
We flag every @preconcurrency import and nonisolated(unsafe) usage that suppresses safety checks without a verified rationale.
8. App Store Binary Size
What we check: What is the App Store thinned download size? Are models included in the main binary or delivered as On-Demand Resources? Does the app meet the 200MB cellular download limit?
Why it matters: Apple enforces a 200MB limit for cellular downloads. Large Core ML models bundled directly in the app binary push many AI apps close to or past this limit. Apps that exceed the limit cannot be downloaded on cellular — which eliminates a significant portion of the user base in markets where WiFi is not default.
Models above 75MB should be evaluated for On-Demand Resources delivery. This adds App Store infrastructure complexity but removes the size from the initial download.
9. On-Device vs. Cloud Inference Decision Points
What we check: Is the on-device vs. cloud inference split explicit and documented? Are there circuit breakers for cloud calls? Does the app function completely offline?
Why it matters: Apps that mix on-device and cloud inference without explicit decision logic create unpredictable behavior. Users in airplane mode, on slow connections, or in regions with data restrictions will encounter failures that are hard to debug and impossible to reproduce in development.
Our audit maps every inference call path and verifies that the offline path is tested, documented, and reliable. For apps that use cloud inference, we check for rate limiting, timeout handling, and graceful degradation.
What the Audit Produces
Every audit ends with a written report organized by severity: Critical (blocks App Store submission), Major (causes production failures at scale), Minor (code quality and maintainability). Each finding includes the specific file and line, a description of the failure mode, and the recommended fix.
Most audits surface 2–4 Critical findings and 8–15 Major findings. The Critical findings are almost always the same nine checks above. The Major findings depend on the specific codebase.
Typical remediation time for Critical findings: 3–5 engineering days using our recommendations. For teams without experience in Swift 6 concurrency or Core ML production patterns, we offer a remediation sprint where we fix the issues directly.
Get Your Architecture Audited
If your app has AI features and you have not done a formal architecture review, the issues above exist in your codebase. They may not be causing problems today, but they will — at scale, at iOS update time, or at App Store submission.
Our Architecture Audit service covers all nine checks above and takes 3–5 business days. You get a prioritized written report and a 60-minute review call to walk through findings.
Request an Architecture Audit →