Core AI in iOS 27: Running Your Own Models On-Device Alongside Foundation Models
- Author
- Ehsan Azish · 3NSOFTS
- Updated
- June 2026
- Read time
- 14 min read
- Level
- Senior
- Platform
- iOS 27+, Apple silicon, model conversion (Python), Swift
Implementation Notes
- ~/ What broke: Custom neural-network model work gets conflated with Foundation Models prompts or old Core ML deployment paths.
- ~/ What to do: Use Core AI only for owned neural networks, then isolate conversion, specialization, caching, and profiling paths.
Beta notice. Core AI is available in the iOS 27 / macOS 27 developer beta and is marked Beta across all platforms. APIs, the
.aimodelformat, and the toolchain are subject to change before release. Verify everything against the current SDK.
iOS 27 ships a new framework, Core AI, for building, running, and deploying your own AI models on Apple silicon. The hype coverage has badly conflated it with two other things, so the first job of this guide is to draw the boundaries clearly — then walk the actual architecture.
What Core AI is — and what it is not
Three on-device AI surfaces now coexist, and they solve different problems:
- Foundation Models — Apple's pre-trained on-device LLM (and the PCC server variant). You don't bring a model; you prompt Apple's. Use it for summarization, extraction, generation, tool calling.
- Core ML — for non-neural-network model types: decision trees, tabular feature engineering, classical ML. Apple is explicit: if your model isn't a neural network, you're in Core ML territory, not Core AI.
- Core AI — for running your own neural-network models (the latest architectures, custom LLMs, vision models you've trained or converted) across CPU, GPU, and Neural Engine, with control over specialization, caching, and inference performance.
Two corrections to the things circulating about Core AI:
- Core AI is not "on-device LoRA fine-tuning of Foundation Models." That's a separate Foundation Models adapter flow. Core AI is about your models, in the
.aimodelformat. - Core AI is not a replacement for Core ML. They're siblings split by model type — neural networks (Core AI) vs. everything else (Core ML).
If you've shipped Core ML neural-network models and wanted finer control over Apple silicon execution, Core AI is the path Apple is pointing you toward.
The shape of the API
Core AI's Swift surface centers on a small set of types:
AIModelAsset— an unspecialized source model asset (the model as delivered/stored).AIModel— a model specialized for running inference on a specific device.InferenceFunction— a function that runs inference on input values and produces outputs, described by anInferenceFunctionDescriptor(its signature).InferenceValue— a value an inference function accepts or produces;NDArray(withNDArrayDescriptor) is the multidimensional array type for tensor in/out, andImageDescriptordescribes image dimensions and pixel format.ComputeStream— a stream of work run asynchronously.
The conceptual flow: you start from an unspecialized AIModelAsset, specialize it into an AIModel for the device, then invoke InferenceFunctions with InferenceValue/NDArray inputs, scheduling work on a ComputeStream.
// Illustrative shape — confirm exact initializers/labels against the beta SDK.
import CoreAI
// 1. Start from an unspecialized asset, specialize for this device.
let asset = try AIModelAsset(/* source model */)
let model = try await asset.specialize(options: SpecializationOptions(/* compute units, etc. */))
// 2. Look up an inference function by its descriptor.
let infer = model.inferenceFunction(/* descriptor */)
// 3. Run inference with NDArray inputs on a compute stream.
let input = NDArray(/* shape, scalar type */)
let output = try await infer(input)
The snippet shows the architecture (asset → specialize → inference function → run), not exact signatures. Core AI is beta; pin the real initializer labels, the
specializeentry point, and the inference-call convention to the SDK you build against, and keep them isolated so a beta-to-beta change is a one-file edit.
The conversion toolchain (this happens off-device, in Python)
You don't author .aimodel files by hand. Core AI ships a toolchain that mirrors the Core ML workflow:
- Core AI Optimization — prepare/optimize your model for Apple silicon.
- Core AI PyTorch Extensions — convert the prepared model into the
.aimodelformat. coreai-build— a command-line tool to compile models ahead of time, which reduces on-device specialization time at runtime.
If you live in the Apple-silicon ML world, MLX is the natural training/experimentation companion here — a NumPy-like array framework with composable transforms (autodiff, vectorization, graph optimization), lazy evaluation, and a unified-memory model so arrays move between CPU and GPU without copies. Train/iterate in MLX or PyTorch, convert via the Core AI extensions, ship the .aimodel.
Specialization, caching, and storage footprint
The asset-vs-model split exists for a reason: specialization is the step that adapts an unspecialized asset to the specific device's compute units, and it has a cost. Core AI gives you two levers to manage it:
- Ahead-of-time compilation (
coreai-build) moves specialization work to build time, cutting first-run latency on device. AIModelCachestores specialized artifacts so you don't re-specialize on every launch — and Core AI's caching/specialization configuration lets you reduce your app's storage footprint by controlling which artifacts persist.
ComputeUnitKind lets you reason about which hardware units (CPU/GPU/Neural Engine) are available for inference, and SpecializationOptions is where you express those preferences. For a large custom model, getting specialization and caching right is the difference between a snappy first launch and a multi-second stall.
Debugging and profiling
Core AI integrates with the developer toolchain rather than leaving you to guess:
- The Core AI debug gauge and Core AI instrument monitor and profile inference performance inside your app.
- The standalone Core AI Debugger app supports visualization and numeric debugging — you can inspect model structure and trace tensor values back to your original Python source, which is the feature that makes numeric mismatches (the classic "it worked in Python, it's wrong on device") tractable.
Asset operations surface failures through AssetError, so handle asset load/specialization errors explicitly rather than assuming the model is always ready.
When Core AI is the right tool
Be honest about the bar here — most apps should not reach for Core AI:
- Use Foundation Models if Apple's on-device LLM (or PCC) can do the task with prompting, guided generation, and tools. This covers the large majority of "add AI to my app" cases at zero model-management cost.
- Use Core ML for non-neural models (trees, tabular).
- Use Core AI when you genuinely need to run your own neural network on-device — a custom-trained model, a specific architecture Apple's model can't cover, a vision/audio model you own — and you want control over Apple-silicon execution, specialization, and caching.
The operational cost is real: a Python conversion pipeline, .aimodel packaging, specialization/caching tuning, and version management. Take it on only when the capability justifies that overhead.
Production checklist
- Pick the right framework: Foundation Models for Apple's LLM, Core ML for non-neural models, Core AI for your own neural networks.
- Convert off-device: Core AI Optimization → PyTorch Extensions →
.aimodel; compile AOT withcoreai-build. - Manage specialization: AOT-compile and use
AIModelCacheto cut first-run latency and control storage. - Profile with the Core AI gauge/instrument; use the Debugger app to trace tensors to Python source.
- Handle
AssetErroron load/specialization — don't assume readiness. - Pin beta API signatures and isolate them; Core AI is pre-release.
Why this matters for shipped apps
Core AI fills the gap between "prompt Apple's model" and "ship a classical Core ML model": it's the supported, Apple-silicon-native way to run your own neural networks on device with real control over performance. For most teams the headline is actually the boundary — knowing that Foundation Models covers the common cases means you don't take on a model pipeline you don't need. For the minority who genuinely own a model, Core AI is a serious upgrade over bolting neural nets onto Core ML, with a real debug/profile story behind it.
Deciding whether a feature needs Core AI, Core ML, or just Foundation Models — and architecting the pipeline if it does — is exactly the kind of call we make in our on-device AI integration and architecture audit work at 3NSOFTS.