Apple's iOS 27 Beta Ships the Multi-Model Extensions API

iOS 27 Beta 1 is live for developers today, shipping Apple's new Extensions framework that lets Gemini, Claude, and ChatGPT plug into Siri - plus the Nvidia B200 Confidential Computing architecture that keeps those cloud queries private.

Apple's iOS 27 Beta Ships the Multi-Model Extensions API

iOS 27 Beta 1 landed for developers Monday afternoon, roughly two hours after Tim Cook walked off the Apple Park stage for the last time as CEO. The Extensions API is live in the beta, which means any developer with a $99 Apple Developer Program membership and an iPhone 15 Pro or newer can start building right now.

Key Specs

WhatDetail
Beta availabilityJune 8, 2026 - iOS 27 Beta 1
Production launchSeptember 2026 with next iPhone
Minimum iOS 27 deviceiPhone 12
Minimum AI features deviceiPhone 15 Pro
Default AI providerGoogle Gemini (custom 1.2T param)
Available providersGemini, ChatGPT (early access), Claude (confirmed)
Cloud hardwareNvidia B200 Blackwell via Google Cloud
On-device model~3B param, A17 Pro / M-series NPU

The announcement wraps up what has been a long public story. Back in March, reporting confirmed that Apple asked Google to run Siri after Private Cloud Compute stalled at 10% utilization. In May, Bloomberg detailed the Extensions system that lets users pick their preferred AI provider. Today those pieces came together into shipping code.

What the Extensions Framework Actually Plugs Into

The Extensions framework is built on top of App Intents - the same API Apple uses for Shortcuts, Widgets, and live activities. A Siri Extension doesn't create a new app or panel. It registers four endpoints that Apple surfaces can call.

The Four Integration Points

Siri app. Apple shipped a standalone Siri app that looks like a chat interface, with iMessage-style conversation bubbles, persistent history synced via iCloud, and a paperclip for attaching images and PDFs. Your Extension becomes a selectable option in this app alongside the Gemini default.

Writing Tools. The system-level text rewriting layer that appears across all apps - the one that rewrites an email or summarizes a note - can now route through your model instead of Gemini.

Image Playground. Apple's text-to-image surface. Third-party providers can plug in here, though Apple hasn't confirmed whether generation models or only text models can register for this surface.

Search or Ask. The system-wide natural language query surface accessible from the Dynamic Island. When a user invokes it and picks your model, the query routes through your Extension.

Apple's WWDC 2026 keynote at Apple Park Tim Cook delivered his final WWDC keynote before handing leadership to hardware chief John Ternus on September 1, 2026. Source: macrumors.com

How to Build an Extension

The integration path uses App Intents with an async perform() method. A minimal starting point looks like this:

import AppIntents

struct QueryMyAIModel: AppIntent {
    static var title: LocalizedStringResource = "Ask My AI"
    
    @Parameter(title: "Query")
    var query: String
    
    func perform() async throws -> some ReturnsValue<String> {
        let response = try await MyAIService.query(query)
        return .result(value: response)
    }
}

The model selection lives in Settings > Apple Intelligence & Siri > Extensions. Apple uses the same containerized, permission-gated sandboxing it applies to share sheets and keyboard replacements since iOS 8. Extensions see only the context they're handed - there's no ambient access to system data.

For AI providers, the registration process requires App Store distribution and an active Apple Developer Program membership. ChatGPT has early access already. Google's Gemini is the default. Anthropic's Claude is confirmed but hasn't shipped its Extension yet - Apple listed it as "coming soon" in the beta release notes.

Developer Requirements

RequirementDetail
Apple Developer Program$99/year
Requirement frameworkApp Intents (Foundation Models optional)
Beta installiOS 27 developer beta, Xcode 26.3
DistributionApp Store required for user-facing Extensions
Early mover incentiveApp Store featuring for Extensions shipping before September

The Backend - What Siri Is Actually Running

The extension endpoints are straightforward. The infrastructure underneath them is more interesting.

Why Private Cloud Compute Didn't Work

Apple built Private Cloud Compute on modified M2 Ultra chips - the same processors that power high-end Mac Studios. Those chips are efficient for the size, but they lack the memory bandwidth and compute density needed to run trillion-parameter models at the throughput Siri requires. The system hit roughly 10% average use, with some servers never installed. Apple's Private Cloud Compute was built for its own 150-billion-parameter cloud model. Gemini runs at 8x that size.

The solution was to stop trying to scale the wrong hardware and route to infrastructure that already exists: Google Cloud, which runs Nvidia's Blackwell B200 GPUs by default for its largest Gemini inference workloads.

Nvidia B200 and Confidential Computing

The B200 is Nvidia's current flagship data center GPU - 192 GB of HBM3e memory at 8 TB/s bandwidth, roughly 4x the inference throughput of the H100 it replaced. On Apple's Siri queries, the relevant spec is latency: Google's B200-backed Gemini inference returns in under 300ms for complex requests, which is fast enough to not feel like a round-trip to the cloud.

Nvidia H100 Tensor Core GPU - the architecture generation preceding B200 Nvidia's Blackwell B200 - the generation following the H100 shown here - powers Siri's cloud inference through Google Cloud, with Confidential Computing enabled across all GPU memory and NVLink interconnects. Source: commons.wikimedia.org

The privacy piece is handled by Nvidia Confidential Computing. This encrypts data in GPU memory during processing - model weights, user input, and the inference result are all encrypted while they sit in and move through the B200. In multi-GPU setups, NVLink traffic between chips is also encrypted. The CPU uses a trusted-execution environment with cryptographic attestation that Apple's bridge layer verifies before transmitting any user data.

Apple's bridge layer strips personally identifiable information before queries reach Gemini, and the compute is stateless - nothing is retained after a request resolves. The result is a privacy claim that's technically stronger than simply trusting Google's data practices, because the hardware enforces it.

The on-device side handles roughly 70% of requests. Timers, smart home controls, basic reminders, and anything with a direct device API call never leaves the phone. The A17 Pro NPU runs Apple's roughly 3-billion-parameter on-device model, which handles these at 12 TOPS with no network round-trip.

Where It Falls Short

The Extensions system is well-designed for what Apple wants it to do: offer user choice among a small set of approved AI providers without opening arbitrary network access. That constraint is also the main limitation.

No open models. Llama, Mistral, Gemma, and every other downloadable model aren't eligible for Extensions. Apple's approval process and App Store distribution requirement effectively rule out local-model providers. You can run those models through third-party apps outside of Siri, but not as selectable options within Apple Intelligence.

Surfaces only, not apps. Third-party apps can't invoke your Extension directly. Only Apple's four surfaces - Siri, Writing Tools, Image Playground, and Search or Ask - can call registered Extensions. If you're building an app that uses AI, you still call your own models directly; you don't hook into the Extensions framework.

iPhone 15 Pro or newer for AI features. iOS 27 installs on iPhone 12 and newer, but Apple Intelligence and Extensions require iPhone 15 Pro or the upcoming iPhone 17 line. That's a meaningful restriction given the install base.

Three providers, no timeline for more. The beta ships with Gemini as default, ChatGPT in early access, and Claude listed as upcoming. There's no published process for a fourth provider to register. Apple said Extensions are open to "any App Store AI app," but the current approved list doesn't reflect that yet.

The Confidential Computing attestation chain is only as strong as the implementation. The hardware encryption is real. The stateless compute claim is plausible. But the bridge layer that enforces all of this is Apple code running on Google infrastructure - and independent verification of the attestation chain isn't part of the current beta documentation.

The production launch is September 2026, coinciding with the next iPhone generation. Developers have three months to build and submit Extensions before the feature goes live for consumers.


Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.