App IntentsUpdated · June 2026

App Intents 2.0 for AI Features: Streaming, Multi-Turn, and Keeping Inference On-Device

Author: Ehsan Azish · 3NSOFTS
Updated: June 2026
Read time: 15 min read
Level: Intermediate to Senior
Platform: iOS 27+, Swift, App Intents, Foundation Models

Implementation Notes

~/ What broke: Streaming structured output flickers and jumps in SwiftUI.
~/ What to do: Render partial snapshots with stable identity and a final-state handoff.

App Intents 2.0App Intents streaming responsemulti-turn App Intenton-device LLM SiriApp Intents Foundation Modelslocal model Siri privacyView Annotations API iOS 27

This guide builds on the App Intents foundation in Making Your App Siri-Actionable in iOS 27. If your AI action depends on local model availability, pair it with the SystemLanguageModel availability gating guide before shipping.

What broke

The old App Intents model was fire-and-return: perform() runs, you hand back a result, Siri shows a card, done. That works for "set a timer." It does not work for an AI feature.

An on-device LLM generation takes seconds and produces tokens progressively. Forced into the old model, an AI intent either blocks while the whole response generates — so Siri sits silent and the user assumes it hung — or it truncates to whatever fit the single result. Either way the experience is wrong, and it is the reason most "Siri can talk to my AI app" attempts before iOS 27 felt broken.

App Intents 2.0, shipped at WWDC 2026, adds the three pieces that fix this: streaming responses for long-running actions, multi-turn conversational follow-ups, and richer entity types. There is also a new View Annotations API that lets users reference on-screen elements naturally ("summarize that"). Together they make App Intents a viable front end for an LLM feature instead of a fight against the framework.

What to do

Stream the generation instead of blocking on it

A long-running AI intent should report progress and stream partial output rather than returning once at the end. The shape: your perform() consumes your model's token stream and feeds Siri progressive updates, so the user sees "still working" state and text filling in rather than a frozen card.

import AppIntents

struct AskAssistantIntent: AppIntent {
    static let title: LocalizedStringResource = "Ask Assistant"

    @Parameter(title: "Question")
    var question: String

    // A long-running, streaming AI action.
    @MainActor
    func perform() async throws -> some IntentResult & ProvidesDialog {
        // LocalModel is your on-device runtime — Foundation Models,
        // Core AI, or your own llama.cpp/Echo bridge. The point is the
        // token stream is consumed here, inside perform(), and never
        // leaves the device.
        var assembled = ""
        for try await token in LocalModel.shared.stream(prompt: question) {
            assembled += token
            // Surface progressive output to the system as it generates,
            // so Siri shows live text instead of a blocked card.
            await IntentProgress.report(partial: assembled)
        }
        return .result(dialog: IntentDialog(stringLiteral: assembled))
    }
}

Treat the streaming/progress surface above as illustrative of the pattern. Confirm the exact iOS 27 progress-reporting type names against Apple's beta documentation before you ship — the framework is still moving.

Ask clarifying questions with multi-turn follow-ups

Multi-turn support means an intent can pause, ask the user something, and continue in the same Siri context instead of dumping them to a result page and ending. Use the @Parameter request-value flow to drive the follow-up:

struct StartGuidedSessionIntent: AppIntent {
    static let title: LocalizedStringResource = "Start Guided Session"

    @Parameter(title: "Topic")
    var topic: String?

    @MainActor
    func perform() async throws -> some IntentResult & ProvidesDialog {
        // If Siri did not capture a topic, ask for one and continue
        // the conversation rather than failing the intent.
        let resolvedTopic = try await $topic.requestValue(
            "What should the session focus on?"
        )
        let plan = try await LocalModel.shared.plan(for: resolvedTopic)
        return .result(dialog: IntentDialog(stringLiteral: plan.summary))
    }
}

This is what turns a one-shot command into a conversation: Siri sustains the dialogue within your app's context, carrying the thread instead of restarting.

Model AI output as entities so Siri can chain it

The reason to return an AppEntity from an AI intent rather than a plain string: it lets the new Siri pass your result into the next action. A generated summary that is an entity can be saved, shared, or fed to another intent in a single chained request. A string is a dead end.

struct GeneratedSummary: AppEntity {
    static let typeDisplayRepresentation: TypeDisplayRepresentation = "Summary"

    let id: UUID
    @Property(title: "Text") var text: String

    var displayRepresentation: DisplayRepresentation {
        DisplayRepresentation(title: "Summary", subtitle: "\(text.prefix(80))")
    }

    static var defaultQuery = GeneratedSummaryQuery()
}

The part nobody else will warn you about: keep inference on-device

This is the section that matters most for a privacy-first app, and it is missing from every generic App Intents writeup.

The new Siri routes complex queries to a server model — a large cloud model — when the on-device model is not enough. That is fine for a generic assistant. It is a problem if your entire product promise is that the model runs locally and nothing leaves the device. An AI intent wired naively can end up with Siri handing the user's query to the cloud instead of to your local model.

Two controls keep you honest:

Do the inference yourself inside perform(). The intent should call your runtime — Foundation Models on-device, Core AI, or your own llama.cpp/Echo bridge — assemble the answer, and return it. Siri delivers the result; Siri does not generate it. As long as the tokens are produced inside your perform() against a local model, they never reach a server.
Declare on-device-only routing per intent. iOS 27 adds per-intent privacy manifest declarations that let you state an interaction must stay on-device. For a regulated or privacy-first app, set this explicitly on every AI intent rather than trusting the default. Then say so in your App Store privacy copy — "Siri answers run entirely on your device" is a differentiation point, not just a compliance line.

For the lower-level routing tradeoffs, see Private Cloud Compute and the Language Model Protocol. If you only take one thing from this guide: an AI intent that secretly falls back to cloud Siri breaks the promise your privacy-first app is built on. Architect it so the local model answers, always, and the system only carries the result.

Validate it end to end

Use the iOS 27 App Intents Testing framework to exercise the streaming and multi-turn paths through real Siri/Shortcuts/Spotlight pathways. The classic AI-intent bug — runs clean, never throws, quietly returns the wrong thing or silently routes off-device — is exactly what real-pathway testing surfaces and what UI automation misses. If your AI feature goes through Foundation Models, the new FoundationModels template in Instruments also lets you trace which prompt and instruction were actually used; note that it only traces work that goes through the framework, not a raw LLM SDK call.

Sources and further reading

Apple Developer — App Intents framework: https://developer.apple.com/documentation/appintents
Apple Developer — WWDC26 Apple Intelligence guide: https://developer.apple.com/wwdc26/guides/apple-intelligence/
WWDC 2026 sessions: "Explore advanced App Intents features for Siri and Apple Intelligence"

API names and behaviors reflect WWDC 2026 beta material and may change as Apple ships beta updates. Verify the streaming and progress-reporting type names against Apple's developer documentation before relying on them.

Authoritative References

Foundation Models frameworkApple IntelligencePrivate Cloud ComputeCore MLCore ML documentation