Core ML vs. ONNX for On-Device AI on iOS: A 2026 Comparison
Core ML and ONNX both run inference on-device, but they make opposite tradeoffs for iOS. Core ML is tightly coupled to Apple hardware and ships with zero runtime overhead. ONNX Runtime gives you cross-platform portability at the cost of startup latency and integration complexity.
On-device AI inference on iOS comes down to two serious options: Core ML — Apple's native framework — and ONNX Runtime, the cross-platform inference engine maintained by Microsoft and the open-source community. Both run models locally without a network connection. Both support common architectures including transformers, CNNs, and RNNs. The decision between them has real consequences for latency, battery life, integration complexity, and long-term maintenance.
This is the direct comparison, without the vague "it depends" disclaimers.
What Each Option Actually Is
Core ML is Apple's first-party inference framework, introduced in iOS 11 and substantially expanded through iOS 18. When you add a .mlpackage file to your Xcode project, Core ML handles hardware scheduling automatically — routing computation to the Apple Neural Engine, GPU, or CPU depending on the operation type and current device load. The ANE is purpose-built silicon for matrix operations and runs at roughly 38 TOPS on iPhone 15 Pro hardware.
ONNX Runtime is an open-source runtime maintained by Microsoft and supported by Google, Meta, and others. The ONNX format is a cross-platform model exchange standard. ONNX Runtime on iOS runs via a Swift/Objective-C package and executes inference on CPU or GPU. It does not currently have an Apple Neural Engine execution provider for iOS.
Performance
Apple publishes a benchmark: Core ML on iPhone 15 Pro processes MobileNetV2 image classification at under 1ms per inference using the ANE path. For transformer-based NLP models, the ANE path consistently shows 4–8x lower latency than CPU execution.
ONNX Runtime on iPhone 15 Pro runs the same MobileNetV2 model at approximately 4–6ms on CPU and 2–3ms on GPU. Without ANE access, ONNX Runtime cannot reach Core ML's peak throughput for ANE-optimized operations.
The gap is most pronounced for models Core ML fully maps to the ANE. If a model includes custom or unsupported operations, Core ML falls back to CPU — which eliminates the ANE advantage and levels the playing field.
"Every millisecond of inference latency on the main thread is a frame dropped. The Neural Engine running Core ML inference is the reason AI features in production iOS apps feel instantaneous rather than sluggish." — Ehsan Azish, 3NSOFTS
Battery and Thermal Efficiency
The ANE is not just faster — it consumes significantly less power than the GPU for equivalent operations. Apple reports that the Neural Engine is approximately 10x more energy-efficient than the GPU for supported matrix workloads. For apps that run inference continuously (live camera, audio analysis, health monitoring), the efficiency difference translates directly into battery life.
ONNX Runtime on GPU draws more power for equivalent compute. On a sustained inference workload like real-time video classification, the thermal difference becomes noticeable within minutes.
Model Conversion and Tooling
Core ML workflow:
- Train in PyTorch, TensorFlow, or another framework
- Export to ONNX or keep as PyTorch
- Convert with
coremltools.convert()orcoremltools.converters.mil - Add
.mlpackageto Xcode - Generate Swift inference class automatically in Xcode
Core ML's auto-generated Swift class eliminates most boilerplate. You call one method, pass typed inputs, and receive a result struct.
ONNX Runtime workflow:
- Train and export to ONNX format
- Add
onnxruntimeSwift Package dependency - Load model, create session, manually construct input tensors
- Parse output tensors manually
ONNX Runtime requires more setup and error handling per inference call. The API is lower-level. That is a cost — but also the reason it is portable across platforms without modification.
Core ML
- Apple Neural Engine access
- Auto-generated Swift API
- Lowest latency on Apple hardware
- iOS only (and Apple platforms)
- Requires model conversion
ONNX Runtime
- Runs on iOS, Android, desktop
- One model file, all platforms
- CPU and GPU only on iOS
- Higher latency than ANE-optimized Core ML
- Lower-level API, more setup required
Privacy Architecture
Both Core ML and ONNX Runtime run entirely on-device. No data leaves the device during inference. The privacy properties are equivalent at the inference layer.
The difference is in the ecosystem. Core ML integrates directly with Apple's privacy framework. You can combine Core ML with NSPrivacyAccessedAPITypes declarations in your privacy manifest, and Apple's App Store review confirms on-device processing. ONNX Runtime does not have this formal integration — it is on the developer to document and assert the same guarantees.
For regulated industries (healthcare, finance) or privacy-forward product positioning, Core ML's deep integration with Apple's privacy model is a meaningful differentiator.
Model Format and Portability
ONNX is the de facto cross-platform model exchange standard. Exporting to ONNX from PyTorch is one line. From TensorFlow, the tf2onnx converter handles most models. Every major AI framework can read and write ONNX.
Core ML's .mlpackage format is Apple-specific. It does not run on Android, Windows, or Linux without conversion. If your business requires serving the same model on multiple platforms, maintaining one ONNX artifact and running it with ONNX Runtime on all targets is simpler than maintaining separate Core ML and Android conversions.
When to Use Core ML
- iOS, iPadOS, or macOS is your primary or only deployment target
- Latency is critical — real-time inference, live camera, immediate UI feedback
- Battery efficiency matters — background processing, health monitoring, continuous audio analysis
- You want the simplest possible Swift integration with auto-generated typed APIs
- Privacy positioning matters and you want integration with Apple's formal privacy manifest system
When to Use ONNX Runtime
- You ship on iOS and Android and want one model artifact for both
- Your model uses operations not yet supported by Core ML's ANE path
- Your team works in Python MLOps pipelines that are already ONNX-native
- Latency targets above 5–10ms make the ANE advantage less decisive for your use case
The Verdict
For iOS-primary products, Core ML is the right choice in the majority of cases. ANE access is a hardware advantage Apple has built into every iPhone since 2017, and ONNX Runtime on iOS cannot access it. The inference speed and battery efficiency difference is not marginal — it is the difference between features that feel native and features that feel like compatibility layers.
ONNX Runtime earns its place in cross-platform architectures. If Android is in scope, maintaining one ONNX model is simpler than parallel conversion pipelines. But if iOS is your platform, use Apple's framework.
Start with an Architecture Review
Choosing the wrong inference framework mid-project is expensive. The model conversion process, the Swift integration layer, and the testing infrastructure all need to be rebuilt. Getting the framework decision right at the start saves weeks.
Our On-Device AI Integration service includes a framework selection audit — we assess your model type, latency targets, and platform requirements and give you a clear recommendation before a line of inference code is written. If you have an existing app using the wrong framework, we handle migration.
Learn about On-Device AI Integration →