iOS App Performance Estimator
Estimate p50 and p95 latency for Core ML inference, CloudKit sync, Core Data queries, and network API calls — by feature type and device class. Generates specific optimization recommendations.
Set to your minimum supported device for worst-case planning.
On-Device AI
Classification, lightweight NLP, or similar models under 10M params running on Neural Engine.
p50 latency (median)
12–30ms
Half of operations complete within this range
p95 latency (95th percentile)
25–60ms
95% of operations complete within this range
Performance budget guidance
Target: <50ms p95 for inline UI feedback, <120ms p95 for editor suggestions.
Neural Engine available on this device — compile for ANE for best throughput.
All device profiles — Core ML inference — small model (<10M params)
| Device | p50 | p95 | Neural Engine |
|---|---|---|---|
| iPhone 15 Pro (A17 Pro) | 8–20ms | 15–40ms | Yes |
| iPhone 15 / 14 Pro (A16) | 10–25ms | 20–50ms | Yes |
| iPhone 14 / 13 Pro (A15) | 12–30ms | 25–60ms | Yes |
| iPhone 12 Pro / 13 (A14) | 15–35ms | 30–75ms | Yes |
| iPhone 12 (A13) | 20–45ms | 40–90ms | Yes |
Optimization checklist
- 1Use .mlpackage compiled for Neural Engine (ANE) — verify with Xcode's Core ML performance report.
- 2Warm up the model on app launch with a dummy prediction to avoid first-inference cold start.
- 3Avoid running inference on the main thread — use a dedicated actor or background queue.
About these estimates
Latency ranges are synthesized from Apple platform benchmarks, Core ML performance reports in WWDC sessions, and real-world profiling across production apps. They represent typical conditions — warm cache, no thermal throttling, device not in Low Power Mode. Your app's actual latency depends on model architecture, batch size, concurrent workloads, and network conditions.
Always measure with Instruments on real hardware. The Xcode Core ML Performance Report provides ANE vs CPU vs GPU routing and layer-level latency breakdown.