Is Core ML faster than ONNX Runtime on iPhone?

For most classification and natural language tasks, yes. Core ML routes computation through the Apple Neural Engine by default, which benchmarks 2–10x faster than the CPU for supported operations. ONNX Runtime on iOS runs on CPU or GPU but does not currently have native Apple Neural Engine access, which means it cannot match Core ML's peak throughput on Apple Silicon hardware. The gap narrows for models with operations the ANE does not support, where both fall back to GPU or CPU paths.

Can I convert a PyTorch or TensorFlow model to Core ML?

Yes. Apple's coremltools library converts PyTorch and TensorFlow/TFLite models to Core ML format using coremltools.convert(). The conversion handles most standard operation types — convolutions, transformers, attention layers. Custom operations or exotic layer types may require writing a custom Core ML layer, which adds effort. ONNX provides an alternative path: export your PyTorch model to ONNX first, then either run it directly with ONNX Runtime or convert the ONNX model to Core ML with coremltools.

Does Core ML send data to Apple servers?

No. Core ML inference runs entirely on-device. The model weights and all input data stay on the device. Apple does not have access to what data is passed through a Core ML model. This is a key architectural difference from cloud AI APIs and a core reason Core ML is preferred for privacy-sensitive iOS applications.

When should I use ONNX Runtime instead of Core ML on iOS?

Use ONNX Runtime when you need a single model to run on iOS, Android, and desktop without platform-specific conversion for each. If your team maintains one model artifact and ships across multiple platforms, ONNX Runtime's cross-platform consistency reduces maintenance overhead. Use Core ML when iOS is the primary or only target, and when latency and battery efficiency are priorities — Core ML's Apple Neural Engine access gives it a hardware advantage that ONNX Runtime currently cannot match on iPhone and iPad.

On-Device AI

Core ML vs. ONNX for On-Device AI on iOS: A 2026 Comparison

Core ML and ONNX both run inference on-device, but they make opposite tradeoffs for iOS. Core ML is tightly coupled to Apple hardware and ships with zero runtime overhead. ONNX Runtime gives you cross-platform portability at the cost of startup latency and integration complexity.

By Ehsan Azish · 3NSOFTS·April 2026·8 min read·iOS 17+, Core ML 7+, ONNX Runtime 1.17+

On-device AI inference on iOS comes down to two serious options: Core ML — Apple's native framework — and ONNX Runtime, the cross-platform inference engine maintained by Microsoft and the open-source community. Both run models locally without a network connection. Both support common architectures including transformers, CNNs, and RNNs. The decision between them has real consequences for latency, battery life, integration complexity, and long-term maintenance.

This is the direct comparison, without the vague "it depends" disclaimers.

What Each Option Actually Is

Core ML is Apple's first-party inference framework, introduced in iOS 11 and substantially expanded through iOS 18. When you add a .mlpackage file to your Xcode project, Core ML handles hardware scheduling automatically — routing computation to the Apple Neural Engine, GPU, or CPU depending on the operation type and current device load. The ANE is purpose-built silicon for matrix operations and runs at roughly 38 TOPS on iPhone 15 Pro hardware.

ONNX Runtime is an open-source runtime maintained by Microsoft and supported by Google, Meta, and others. The ONNX format is a cross-platform model exchange standard. ONNX Runtime on iOS runs via a Swift/Objective-C package and executes inference on CPU or GPU. It does not currently have an Apple Neural Engine execution provider for iOS.

Performance

Apple publishes a benchmark: Core ML on iPhone 15 Pro processes MobileNetV2 image classification at under 1ms per inference using the ANE path. For transformer-based NLP models, the ANE path consistently shows 4–8x lower latency than CPU execution.

ONNX Runtime on iPhone 15 Pro runs the same MobileNetV2 model at approximately 4–6ms on CPU and 2–3ms on GPU. Without ANE access, ONNX Runtime cannot reach Core ML's peak throughput for ANE-optimized operations.

The gap is most pronounced for models Core ML fully maps to the ANE. If a model includes custom or unsupported operations, Core ML falls back to CPU — which eliminates the ANE advantage and levels the playing field.

"Every millisecond of inference latency on the main thread is a frame dropped. The Neural Engine running Core ML inference is the reason AI features in production iOS apps feel instantaneous rather than sluggish." — Ehsan Azish, 3NSOFTS

Battery and Thermal Efficiency

The ANE is not just faster — it consumes significantly less power than the GPU for equivalent operations. Apple reports that the Neural Engine is approximately 10x more energy-efficient than the GPU for supported matrix workloads. For apps that run inference continuously (live camera, audio analysis, health monitoring), the efficiency difference translates directly into battery life.

ONNX Runtime on GPU draws more power for equivalent compute. On a sustained inference workload like real-time video classification, the thermal difference becomes noticeable within minutes.

Model Conversion and Tooling

Core ML workflow:

Train in PyTorch, TensorFlow, or another framework
Export to ONNX or keep as PyTorch
Convert with coremltools.convert() or coremltools.converters.mil
Add .mlpackage to Xcode
Generate Swift inference class automatically in Xcode

Core ML's auto-generated Swift class eliminates most boilerplate. You call one method, pass typed inputs, and receive a result struct.

ONNX Runtime workflow:

Train and export to ONNX format
Add onnxruntime Swift Package dependency
Load model, create session, manually construct input tensors
Parse output tensors manually

ONNX Runtime requires more setup and error handling per inference call. The API is lower-level. That is a cost — but also the reason it is portable across platforms without modification.

Core ML

Apple Neural Engine access
Auto-generated Swift API
Lowest latency on Apple hardware
iOS only (and Apple platforms)
Requires model conversion

ONNX Runtime

Runs on iOS, Android, desktop
One model file, all platforms
CPU and GPU only on iOS
Higher latency than ANE-optimized Core ML
Lower-level API, more setup required

Privacy Architecture

Both Core ML and ONNX Runtime run entirely on-device. No data leaves the device during inference. The privacy properties are equivalent at the inference layer.

The difference is in the ecosystem. Core ML integrates directly with Apple's privacy framework. You can combine Core ML with NSPrivacyAccessedAPITypes declarations in your privacy manifest, and Apple's App Store review confirms on-device processing. ONNX Runtime does not have this formal integration — it is on the developer to document and assert the same guarantees.

For regulated industries (healthcare, finance) or privacy-forward product positioning, Core ML's deep integration with Apple's privacy model is a meaningful differentiator.

Model Format and Portability

ONNX is the de facto cross-platform model exchange standard. Exporting to ONNX from PyTorch is one line. From TensorFlow, the tf2onnx converter handles most models. Every major AI framework can read and write ONNX.

Core ML's .mlpackage format is Apple-specific. It does not run on Android, Windows, or Linux without conversion. If your business requires serving the same model on multiple platforms, maintaining one ONNX artifact and running it with ONNX Runtime on all targets is simpler than maintaining separate Core ML and Android conversions.

When to Use Core ML

iOS, iPadOS, or macOS is your primary or only deployment target
Latency is critical — real-time inference, live camera, immediate UI feedback
Battery efficiency matters — background processing, health monitoring, continuous audio analysis
You want the simplest possible Swift integration with auto-generated typed APIs
Privacy positioning matters and you want integration with Apple's formal privacy manifest system

When to Use ONNX Runtime

You ship on iOS and Android and want one model artifact for both
Your model uses operations not yet supported by Core ML's ANE path
Your team works in Python MLOps pipelines that are already ONNX-native
Latency targets above 5–10ms make the ANE advantage less decisive for your use case

The Verdict

For iOS-primary products, Core ML is the right choice in the majority of cases. ANE access is a hardware advantage Apple has built into every iPhone since 2017, and ONNX Runtime on iOS cannot access it. The inference speed and battery efficiency difference is not marginal — it is the difference between features that feel native and features that feel like compatibility layers.

ONNX Runtime earns its place in cross-platform architectures. If Android is in scope, maintaining one ONNX model is simpler than parallel conversion pipelines. But if iOS is your platform, use Apple's framework.

Start with an Architecture Review

Choosing the wrong inference framework mid-project is expensive. The model conversion process, the Swift integration layer, and the testing infrastructure all need to be rebuilt. Getting the framework decision right at the start saves weeks.

Our On-Device AI Integration service includes a framework selection audit — we assess your model type, latency targets, and platform requirements and give you a clear recommendation before a line of inference code is written. If you have an existing app using the wrong framework, we handle migration.

Learn about On-Device AI Integration →

References

→Core ML Documentation - Apple Developer

→coremltools - Apple GitHub

→ONNX Runtime for iOS - Microsoft

→Apple Neural Engine - Apple Silicon Overview (WWDC 2022)

→Reducing your app size - Apple Developer Documentation

Authoritative References

Core MLCore ML documentationCore ML toolsweb.dev performance guidanceMDN Web Docs