On-Device AI

Core ML vs ONNX for iOS in 2026: A Production Comparison

A production comparison of Core ML and ONNX Runtime for iOS: hardware utilization, toolchains, app size, privacy, portability, conversion paths, and the decision framework for shipping on-device AI.

By Ehsan Azish · 3NSOFTS·June 2026·8 min read

Core ML and ONNX Runtime both solve the same surface problem: running machine learning models without sending user data to a server. In production iOS apps, they are not equivalent choices. They differ in hardware access, app size, toolchain complexity, portability, and the amount of Swift integration code the product team owns after launch.

This article is the production comparison: not which format is more elegant, but which choice holds up when the app ships to real devices, offline conditions, App Store review, and long-term model updates.

What Each Format Actually Is

Core ML

Core ML is Apple's native model format and inference runtime. A .mlpackage or compiled .mlmodelc is bundled with the app, loaded through the Core ML framework, and scheduled across the Apple Neural Engine, GPU, and CPU according to model structure and compute-unit configuration.

Core ML is not only a file format. It is an Apple-platform integration layer: Xcode understands it, Swift can generate typed model wrappers, Instruments can profile it, and App Review can reason about its on-device privacy boundary.

ONNX Runtime for iOS

ONNX is a model exchange format. ONNX Runtime is the execution engine that runs ONNX graphs. On iOS, that means adding the ONNX Runtime package, creating inference sessions manually, constructing tensors, and parsing outputs yourself.

The advantage is portability. A model exported to ONNX can often run across iOS, Android, Windows, Linux, and backend environments with fewer framework-specific conversions.

The tradeoff is that portability is not free. On iOS, the runtime becomes part of your app, and your Swift layer owns more of the loading, memory, tensor, and output-mapping work.

Hardware Utilization

Core ML has the strongest hardware path on Apple devices because it can route supported operations to the Apple Neural Engine. For models that compile cleanly to ANE-compatible graphs, the latency and power advantage is structural.

ONNX Runtime on iOS can use CPU and, depending on configuration, GPU-backed execution paths. It does not give iOS apps the same first-party Neural Engine scheduling path that Core ML gets. For an iOS-first product, that difference matters most when inference happens inside a user-visible flow: camera frames, text classification while typing, health signals, audio analysis, or any feature where feedback must feel immediate.

If the model has unsupported operations, Core ML may fall back to CPU or GPU for part of the graph. That is why the correct production workflow is not "convert and hope." Compile the model, inspect where it runs, then benchmark on the oldest supported device.

Toolchain and Conversion

Core ML Toolchain

The Core ML path usually looks like this:

PyTorch or TensorFlow
-> coremltools conversion
-> .mlpackage
-> Xcode bundle
-> Swift model wrapper or MLModel runtime loading

This path is especially strong when the app is Apple-platform-only. The result fits naturally into Xcode, signing, App Store distribution, and the privacy story.

The cost appears when the model uses unsupported layers, dynamic shapes, or custom operations. Those issues must be fixed during conversion or isolated behind fallback logic.

ONNX Toolchain

The ONNX path usually looks like this:

PyTorch or TensorFlow
-> ONNX export
-> ONNX Runtime session
-> manual tensor input/output mapping

This is attractive when the model team already treats ONNX as the canonical artifact, or when the same model must ship across iOS and Android with minimal divergence.

The iOS app pays for that flexibility in runtime size and integration code. The team owns the tensor boundary instead of leaning on Xcode-generated types.

App Size and Distribution

Core ML ships the model, not an additional inference runtime. The framework is already part of the operating system. The app bundle grows primarily with model weight size.

ONNX Runtime adds runtime code to the app in addition to model weights. For small models, the runtime overhead can be larger than the model itself. For large cross-platform models, that overhead may be acceptable, but it should be measured before committing to the architecture.

For apps near App Store cellular download thresholds, watchOS companion targets, app clips, or extensions, this difference is not theoretical. Bundle size becomes a product constraint.

Privacy and Data Residency

Both Core ML and ONNX Runtime can run entirely on-device. Neither requires a network call for inference.

The difference is evidence and integration. Core ML fits Apple's documented privacy model directly: the model is bundled, inference happens through first-party APIs, and privacy manifests can describe the surrounding data access precisely. ONNX can be equally private, but the app must document the boundary itself and ensure no analytics, logging, or fallback API path contradicts the claim.

If App Store metadata says "on-device AI" or "works offline," then network-restricted testing must still pass. The implementation cannot quietly route to a cloud model when the local runtime fails.

When ONNX Runtime Is the Right Choice

Use ONNX Runtime when cross-platform consistency is the dominant constraint.

The same model must run on iOS, Android, desktop, and server.
The ML team already exports and validates ONNX as the canonical artifact.
Platform-specific performance is less important than keeping one model pipeline.
The model uses operations that do not convert cleanly to Core ML.
The app can absorb the runtime size and manual tensor integration.

ONNX is a good architecture when model portability is worth more than Apple-specific performance.

When Core ML Is the Right Choice

Use Core ML when iOS, iPadOS, macOS, watchOS, or visionOS is the primary product surface.

Latency and battery efficiency matter.
You want the Apple Neural Engine path where supported.
The app needs a clean App Store privacy story.
The Swift integration should be typed and maintainable.
The model can be converted and benchmarked successfully with coremltools.

For Apple-platform-first apps, Core ML is usually the correct default.

Performance Numbers in Context

Benchmarks are only useful when tied to product constraints. A 3ms model and a 12ms model may both be acceptable for a button-triggered classification. The same difference can be unacceptable in a live camera or audio pipeline.

Use three thresholds:

Inline UI feedback: target under 50ms p95 end-to-end.
Realtime sensor or camera loops: target the frame budget, not the average.
Background enrichment: optimize battery and thermal behavior before raw latency.

The Core ML performance benchmarks are a better starting point than generic model-card numbers because they map inference budgets to Apple device classes.

Conversion Between Formats

ONNX can be an intermediate format on the way to Core ML. A common pipeline is PyTorch to ONNX to Core ML, especially when the model team already exports ONNX for other platforms.

The important rule: validate the converted Core ML model against the original model output before shipping. Compare output distributions, not only top-line accuracy. Quantization, operator substitution, and shape handling can create edge failures that do not appear in a small smoke test.

For model-size and latency work, pair this decision with model quantization for Apple Silicon.

Decision Framework

Choose Core ML when the app is Apple-first, latency-sensitive, battery-sensitive, or privacy-positioned.

Choose ONNX Runtime when the organization needs one model artifact across platforms and accepts additional runtime and integration complexity on iOS.

The wrong decision is choosing ONNX for theoretical portability when the product only ships on iOS. The other wrong decision is forcing Core ML into a product where Android parity and one shared MLOps pipeline are non-negotiable.

Production Implications

The framework choice affects the rest of the codebase:

Model loading strategy and cache ownership
Actor isolation for inference calls
Bundle size budgets
Privacy manifest wording
Offline fallback behavior
Test fixtures for model output drift
App Store review notes for AI features

Treat the format decision as architecture, not as an ML export detail.

FAQs

Is Core ML always faster than ONNX Runtime on iOS?

No. Core ML is usually faster when the model maps cleanly to Apple Neural Engine or GPU execution. If a model contains unsupported operations and falls back heavily to CPU, the advantage narrows. Benchmark the compiled model on target devices.

Can an ONNX model be converted to Core ML?

Yes. coremltools can convert many ONNX-origin models to Core ML, although the best path depends on the source framework and operator set. Conversion must be validated against the original model outputs before release.

Does ONNX Runtime send data to a server?

No. ONNX Runtime can run inference entirely on-device. Privacy risk usually comes from surrounding app architecture: logging, analytics, fallback cloud APIs, or model update systems, not from ONNX itself.

Which should a startup choose for an iOS-only MVP?

Core ML, unless there is a specific unsupported model operation or a near-term Android requirement that makes ONNX the canonical artifact. For iOS-only MVPs, Core ML reduces integration surface and gives the strongest Apple hardware path.

How does quantization affect the choice?

Core ML has mature Apple-platform quantization tooling through coremltools, including INT8 and palettization workflows. ONNX also supports quantization, but the iOS runtime path still needs device-specific benchmarking to verify latency and accuracy.

Work With Me

The On-Device AI Integration engagement covers model format selection, Core ML conversion, quantization, actor-isolated inference, and App Store privacy review for AI features.

References

Authoritative References

Core MLCore ML documentationCore ML toolsApple Developer DocumentationSwift.org documentation