What devices does offgrid:AI support?

offgrid:AI supports all iPhone models capable of running iOS 17+. Apple Foundation Models features (LanguageModelSession) require iOS 18.1+ with Apple Intelligence enabled — available on iPhone 15 Pro and later. On earlier devices, Core ML models provide the inference layer without requiring Apple Intelligence.

How does offgrid:AI handle the privacy boundary?

No user input, no conversation history, and no inference request ever leaves the device. The inference layer has no network stack — there is no URLSession, no API client, and no telemetry path. This is an architectural fact, not a privacy policy claim.

How is battery impact managed during extended AI sessions?

Inference is bounded by session time limits with explicit user confirmation for extended sessions. The Neural Engine handles inference with substantially lower power draw than GPU execution. On sustained load, the app monitors thermal state via ProcessInfo.thermalState and degrades gracefully — reducing context window size or pausing generation — before the system throttles the device.

What is the latency for offgrid:AI responses?

On Apple Silicon devices with the Neural Engine, responses begin streaming in under 2 seconds. Full response generation for typical queries completes in 4–8 seconds depending on response length and device model. On older devices using Core ML without Neural Engine, latency is higher — typically 6–15 seconds for a complete response.

How does offgrid:AI differ from other offline AI apps?

Most apps that claim offline support cache cloud responses or pre-fetch content. offgrid:AI runs full inference on-device using the actual language model — no cached responses, no static lookups. The user's query in the moment is answered by the model running on the device, not by a lookup table.

On-Device AI

offgrid:AI Case Study: Building a Fully Offline AI Assistant on Apple Silicon

offgrid:AI is a fully offline AI assistant that runs on Apple Silicon with zero network dependency. This case study covers the architecture decisions, the inference layer using Core ML and Apple Foundation Models, battery-aware scheduling, privacy boundary enforcement, and the results.

By Ehsan Azish · 3NSOFTS·May 2026·10 min read·iOS 18.1+ for Apple Foundation Models, Core ML for broader device support

The Structural Problem with Cloud-Dependent AI

Most AI assistants on iOS are thin clients. The app handles the UI. The intelligence lives on a server. That architecture holds under one condition: the network is available, the API is responsive, and the user's data is acceptable collateral for the round-trip.

Remove any one of those conditions and the app stops working.

This is not a fringe scenario. Emergency responders, field workers, travellers in low-connectivity regions, anyone with a genuine privacy requirement — they all hit this wall. The app surfaces a spinner. The spinner never resolves. The user has no recourse.

offgrid:AI was built as a direct response to this structural constraint.

The Design Premise

The design premise: an AI assistant that operates at full capability regardless of network state. Not degraded capability. Not a fallback mode. Full inference, on-device, on Apple Silicon, with zero bytes transiting any server.

Every architectural decision flows from that.

Constraints

The constraints that shaped the architecture:

Network connectivity cannot be assumed — the app must function identically online and offline
Zero data egress — no user input, no conversation history, no inference request may leave the device
No API costs — cloud inference at scale introduces per-token costs that break the economics of a consumer app
Battery impact must be bounded — sustained LLM inference on a mobile device can drain a battery in under two hours if unmanaged
Latency must be perceptible as fast — a response that takes 4–6 seconds feels broken to a user in a stressful scenario
Apple Intelligence availability cannot be required — the architecture must degrade gracefully on devices without the Neural Engine tier that supports Foundation Models

Architecture

Inference Layer: Core ML and Apple Foundation Models

The naive approach is to call an external API. Two lines of code, and it works perfectly — until the network is gone.

Core ML is the correct layer for on-device inference on Apple platforms. It routes computation to the most efficient available hardware — Neural Engine, GPU, or CPU — without the developer managing that dispatch manually. On Apple Silicon, the Neural Engine handles transformer inference with substantially lower power draw than GPU execution.

For devices running iOS 18.1 and later with Apple Intelligence enabled, Apple Foundation Models provides direct access to the on-device language model through a structured Swift API. LanguageModelSession handles context management, token streaming, and safety guardrails with no network dependency.

import FoundationModels

let session = LanguageModelSession()

// Streaming response — tokens arrive as they are generated
let stream = session.streamResponse(to: prompt)
for try await partial in stream {
    await MainActor.run {
        self.responseText += partial.text
    }
}

For devices below the Apple Intelligence tier, the app falls back to a quantized Core ML model. This ensures the app is fully functional across the supported device range — not just on the latest hardware.

The Fallback Architecture

Apple Foundation Models requires:

iOS 18.1+
Apple Intelligence enabled (user opt-in)
Sufficient on-device storage (Apple Intelligence models are downloaded on demand)

Any of these conditions may not be met. The fallback chain:

Apple Foundation Models (LanguageModelSession) — preferred, highest quality
Quantized Core ML model — all devices, iOS 17+, no Apple Intelligence required
Static response templates — lowest quality, never blank, always functional

enum InferenceBackend {
    case foundationModels
    case coreML(model: MLModel)
    case staticTemplates
}

func resolveBackend() async -> InferenceBackend {
    if #available(iOS 18.1, *),
       await LanguageModelSession.isAvailable {
        return .foundationModels
    }
    if let model = try? MyQuantizedModel().model {
        return .coreML(model: model)
    }
    return .staticTemplates
}

SwiftData Persistence

Conversation history, user preferences, and session state are stored locally using SwiftData. No iCloud sync — data stays on the device and never transits any cloud infrastructure.

@Model
class Conversation {
    var id: UUID
    var title: String
    var createdAt: Date
    var messages: [Message]

    init(title: String) {
        self.id = UUID()
        self.title = title
        self.createdAt = Date()
        self.messages = []
    }
}

The choice of SwiftData over Core Data reflects the iOS 17+ minimum deployment target. SwiftData's @Query property wrapper binds conversation history directly to the view lifecycle without manual fetch request configuration.

Battery-Aware Scheduling

Sustained LLM inference drains a battery. The inference scheduler monitors thermal state and enforces session limits:

func checkThermalState() -> InferenceThrottle {
    switch ProcessInfo.processInfo.thermalState {
    case .nominal, .fair:
        return .none
    case .serious:
        return .reduceContextWindow
    case .critical:
        return .pauseGeneration
    @unknown default:
        return .none
    }
}

The app presents an explicit warning when a session exceeds 15 minutes of sustained inference, prompting the user to take a break. This is both a battery protection measure and a UX decision — extended inference sessions on consumer hardware produce noticeable device warmth.

Privacy Boundary

The privacy boundary is architectural, not policy-based. The inference layer has no network stack:

No URLSession in the inference module
No analytics SDK
No telemetry path
No crash reporting that includes user input

User conversations are stored in SwiftData on the device. On device deletion, the data is gone. There is no server-side backup, no synchronisation service, and no way for 3NSOFTS to access user conversations.

Results

Offline capability: 100% of features available with no network connection

Inference latency (Apple Foundation Models, iPhone 16 Pro): first token in under 1.5 seconds, full response in 4–8 seconds depending on length

Inference latency (Core ML fallback, iPhone 14): first token in 3–5 seconds, full response in 8–18 seconds

Battery impact: approximately 12% battery per hour of active inference on iPhone 16 Pro, 18% on iPhone 14

App binary size impact: Core ML model bundle adds 85MB to the app download. The Foundation Models path adds zero — the on-device model is part of iOS, not the app bundle.

App Store first-submission approval: passed on first submission with complete privacy manifest

Explore offgrid:AI

offgrid:AI is available on the App Store for iPhone. It runs offline by design — no account, no subscription required to use the core inference features.

View offgrid:AI on the App Store →

offgrid:AI product page →