How long does an iOS architecture audit take?

Our standard architecture audit takes 3–5 business days. The first day covers automated analysis — we run the codebase through Swift 6 strict concurrency checks, static analysis, and our internal tooling. Days two and three cover manual review of the inference pipeline, data model, and sync architecture. The final day is the written report with prioritized findings and specific remediation steps. For codebases over 100K lines, we add one additional day.

What is the most common issue found in iOS AI app audits?

Core ML inference running on the main thread. It is the most frequent and highest-impact issue in every audit we have conducted. Apps that block the main thread during inference cause dropped frames and unresponsive UI during the exact moment the app is doing something interesting. The fix is isolating inference behind a Swift actor and calling it from an async context, which keeps the main thread free while the Neural Engine works.

Do I need an architecture audit before launching my iOS AI app?

Not necessarily before a first launch — but before scaling. The issues an audit catches are not blockers at zero users, they are blockers at 10,000 users. Inference running on the main thread is fine when one person uses it occasionally. It causes App Store review failures and 1-star reviews when real users run your app consistently. We recommend an audit after your first working prototype and before your App Store submission, when fixes are cheapest to make.

What does a privacy manifest audit cover for iOS AI apps?

Apple requires a PrivacyInfo.xcprivacy file declaring exactly which required reason APIs your app accesses, which data types it collects, and whether that data is linked to user identity. AI apps commonly access file timestamps (NSPrivacyAccessedAPICategoryFileTimestamp), user defaults (NSPrivacyAccessedAPICategoryUserDefaults), and system boot time (NSPrivacyAccessedAPICategorySystemBootTime) without declaring them. Our privacy manifest audit checks your actual API usage against your declarations and catches every undeclared access before Apple's reviewers do.

iOS Architecture

iOS AI App Architecture Audit: What We Check and Why

Most iOS apps with AI features fail in production for the same five reasons: model on the wrong thread, no fallback when inference fails, training data assumptions that don't hold on real devices, privacy manifests that don't match what the app actually does, and a data model that wasn't designed for a local-first world. Here is every check in our architecture audit and what each one catches.

By Ehsan Azish · 3NSOFTS·April 2026·9 min read·Swift 6.0, iOS 17+, Core ML 7+, Xcode 16+

An iOS AI architecture audit is not a code review. A code review checks whether the code works. An architecture audit checks whether the code will still work when 10,000 people use it at the same time, on a cold device, with a full iCloud queue, after an iOS update — and whether Apple will actually let it into the App Store.

We have audited over 30 iOS codebases with AI features in the last two years. The same failure modes appear in almost every one. What follows is the complete checklist from our audit process, with the exact failure modes each check prevents.

1. Inference Concurrency

What we check: Is Core ML inference isolated behind a Swift actor or a dedicated serial executor? Is the model loaded asynchronously? Does any inference call block the main thread?

Why it matters: Core ML inference triggers real computation — on the CPU, GPU, or Neural Engine depending on the model and device. Even ANE-routed inference takes finite time. When that time happens on the main thread, UIKit and SwiftUI cannot update the screen. Every frame that should be drawn during inference is dropped.

The fix is straightforward: wrap inference behind a Swift actor.

actor InferenceEngine {
    private let model: MyClassifier

    init() throws {
        self.model = try MyClassifier()
    }

    func classify(input: MLMultiArray) async throws -> MyClassifierOutput {
        try model.prediction(input: input)
    }
}

Every call to classify happens off the main actor. SwiftUI continues updating during inference. This is not an optimization — it is a correctness requirement for any app that needs to remain responsive during AI workloads.

Why

In every audit we have conducted, the single highest-leverage change is pulling Core ML inference off the main thread. Apps that do this correctly feel native. Apps that don't feel frozen.

2. Model Loading Strategy

What we check: Where does model loading happen — app launch, first use, or background? How large is the .mlpackage bundle? Is the compiled model cached, or does it recompile on every launch?

Why it matters: Core ML model loading involves two steps: reading the .mlpackage from disk and compiling it for the current device. On iPhone 15 Pro, loading a 100MB model takes approximately 800ms to 1.5 seconds the first time. Compiled models are cached by iOS, so subsequent loads are faster — but the initial compilation happens at first launch or after an update.

Apps that load models synchronously at launch time will fail App Store review if the reviewer's device is slow. Models above 50MB should be loaded lazily (on first inference request) or proactively after launch via a background task.

We flag models over 100MB as requiring a download-on-demand strategy using MLModel.compileModel(at:) with a background URLSession task rather than bundling in the app binary.

3. Fallback and Degraded Mode

What we check: Does the app handle MLModelError gracefully? What happens when inference returns a low-confidence result? Is there a non-AI fallback for core functionality?

Why it matters: Core ML throws in more scenarios than developers expect: the model file is missing or corrupted, the device is under thermal pressure and ANE is throttled, the model requires iOS 17 but the user has iOS 16, the input shape does not match the model spec. Apps that crash or freeze in any of these cases fail in production.

Every inference call needs a do-catch block with meaningful fallback logic. For classification tasks, a minimum confidence threshold (typically 0.65–0.75) prevents the app from acting on ambiguous predictions. For features where AI is additive rather than core, the non-AI path should work independently.

do {
    let result = try await inferenceEngine.classify(input: array)
    guard result.classLabelProbs[result.classLabel] ?? 0 > 0.7 else {
        return .lowConfidence
    }
    return .prediction(result.classLabel)
} catch {
    logger.error("Inference failed: \(error)")
    return .fallback
}

4. Privacy Manifest Completeness

What we check: Does PrivacyInfo.xcprivacy declare every required reason API the app accesses? Does it accurately describe data collection, data use, and whether data is linked to user identity?

Why it matters: Apple began enforcing privacy manifest requirements in May 2024. Apps that access required reason APIs without declaring them in PrivacyInfo.xcprivacy are rejected with ITMS-91053. iOS AI apps consistently trigger three categories: file timestamp access (used by Core ML when loading models), user defaults access (common for caching inference results or model version flags), and system boot time (used by some analytics SDKs bundled with AI frameworks).

Our audit runs static analysis against the compiled binary to find every required reason API call, then compares against the declared reasons. Every gap is a potential rejection.

5. Data Model Design for Local-First AI

What we check: Is the Core Data or SwiftData schema designed to store predictions independently from the data that generated them? Is there a mechanism to re-run inference when the model is updated? Are prediction results versioned against the model version that generated them?

Why it matters: AI apps update their models. When the model changes, old predictions may no longer be valid. Apps that store predictions without recording which model version generated them cannot selectively invalidate stale results — they either show outdated AI output indefinitely or re-run inference on everything, which is expensive.

The correct pattern: store predictions with a modelVersion field. On app launch, check the current embedded model version against stored predictions. Stale predictions get flagged for background re-inference.

@Model
class Prediction {
    var inputHash: String
    var label: String
    var confidence: Double
    var modelVersion: String  // "2.1.0"
    var createdAt: Date
}

6. Memory Pressure and Model Lifetime

What we check: Is the ML model a shared singleton or instantiated per request? Is the model released when the app is backgrounded? Does the app observe UIApplication.didReceiveMemoryWarningNotification?

Why it matters: Core ML models hold allocated memory proportional to their size. A 200MB model occupying RAM while the app is backgrounded contributes to jetsam-triggered terminations. Apps terminated by the OS in the background lose user state and create confusing restart experiences.

Models should be wrapped in a manager that releases the model on applicationDidEnterBackground and reloads on applicationWillEnterForeground.

7. Swift 6 Strict Concurrency Compliance

What we check: Does the codebase compile cleanly under SWIFT_STRICT_CONCURRENCY = complete? Are there sendability violations, data races, or unchecked @MainActor assumptions?

Why it matters: Swift 6 strict concurrency is enforced by default in Xcode 16. Codebases not compliant with strict concurrency will accumulate warnings that become errors as warnings-as-errors policies tighten. AI workloads almost always involve passing non-Sendable types (like MLMultiArray) across actor boundaries, which requires deliberate design.

We flag every @preconcurrency import and nonisolated(unsafe) usage that suppresses safety checks without a verified rationale.

8. App Store Binary Size

What we check: What is the App Store thinned download size? Are models included in the main binary or delivered as On-Demand Resources? Does the app meet the 200MB cellular download limit?

Why it matters: Apple enforces a 200MB limit for cellular downloads. Large Core ML models bundled directly in the app binary push many AI apps close to or past this limit. Apps that exceed the limit cannot be downloaded on cellular — which eliminates a significant portion of the user base in markets where WiFi is not default.

Models above 75MB should be evaluated for On-Demand Resources delivery. This adds App Store infrastructure complexity but removes the size from the initial download.

9. On-Device vs. Cloud Inference Decision Points

What we check: Is the on-device vs. cloud inference split explicit and documented? Are there circuit breakers for cloud calls? Does the app function completely offline?

Why it matters: Apps that mix on-device and cloud inference without explicit decision logic create unpredictable behavior. Users in airplane mode, on slow connections, or in regions with data restrictions will encounter failures that are hard to debug and impossible to reproduce in development.

Our audit maps every inference call path and verifies that the offline path is tested, documented, and reliable. For apps that use cloud inference, we check for rate limiting, timeout handling, and graceful degradation.

What the Audit Produces

Every audit ends with a written report organized by severity: Critical (blocks App Store submission), Major (causes production failures at scale), Minor (code quality and maintainability). Each finding includes the specific file and line, a description of the failure mode, and the recommended fix.

Most audits surface 2–4 Critical findings and 8–15 Major findings. The Critical findings are almost always the same nine checks above. The Major findings depend on the specific codebase.

Typical remediation time for Critical findings: 3–5 engineering days using our recommendations. For teams without experience in Swift 6 concurrency or Core ML production patterns, we offer a remediation sprint where we fix the issues directly.

Get Your Architecture Audited

If your app has AI features and you have not done a formal architecture review, the issues above exist in your codebase. They may not be causing problems today, but they will — at scale, at iOS update time, or at App Store submission.

Our Architecture Audit service covers all nine checks above and takes 3–5 business days. You get a prioritized written report and a 60-minute review call to walk through findings.

Request an Architecture Audit →