At WWDC 25 Apple opened up the on-device large-language model that powers Apple Intelligence to every iOS, iPadOS, macOS and visionOS app via a new “Foundation Models” framework. The model is a compact ~3 billion-parameter LLM that has been quantized to just 2 bits per weight, so it runs fast, offline and entirely on the user’s device, keeping data private while still handling tasks such as summarization, extraction, short-form generation and structured reasoning. (developer.apple.com, machinelearning.apple.com) Below is a developer-focused English-language overview—based mainly on Apple’s own announcements, docs and WWDC sessions—followed by ready-to-paste Swift code samples.
Apple ships two sibling LLMs: a device-scale model (~3 B params) embedded in Apple silicon and a server-scale mixture-of-experts model that runs inside Private Cloud Compute when more heft is required. (machinelearning.apple.com) The on-device variant is the one developers can call directly through the Foundation Models framework introduced in iOS 26, iPadOS 26, macOS Tahoe 26 and the latest visionOS/watchOS builds. (apple.com, apple.com)
- Privacy-first – everything happens locally; nothing is sent to Apple unless you opt in to PCC. (apple.com)
- Free & offline – no token metering or cloud costs; works without network. (apple.com, techcrunch.com)
- Multilingual – the 2025 update added support for 15 languages with more coming. (machinelearning.apple.com)
- Hardware scope – iPhone 15 Pro/16+, M-series iPads & Macs, and Apple Vision Pro. (apple.com)
Capability | How it’s enabled | Typical use cases | Key API |
---|---|---|---|
Guided Generation – schema-constrained output | @Generable , @Guide macros |
JSON-safe summaries, feature lists, game content | respond() / streamResponse() (developer.apple.com) |
Snapshot Streaming | Async snapshots of partially-generated structs | Typing-ahead UIs, live document assembly | for await partial in stream (developer.apple.com) |
Tool Calling | Tool protocol + guided arguments |
Weather lookup, database fetch, device actions | Framework auto-invokes tools (developer.apple.com) |
Stateful Sessions | Contextful LanguageModelSession objects |
Multi-turn chat, threaded tasks | session.transcript & isResponding (developer.apple.com) |
Adapters | Low-rank fine-tuning (rank 32) | Domain tagging, sentiment, brand voice | Python toolkit + .useCase API (machinelearning.apple.com) |
Beyond general language tasks, Apple uses specialty adapters to power Live Translation, Genmoji, Visual Intelligence and Fitness features, demonstrating the model’s breadth. (apple.com)
- Swift-native design – add
import FoundationModels
, then create aLanguageModelSession
. Three lines of code give you an answer. (apple.com, apple.com) - Vertical integration – the Swift macro system, OS daemon and quantized weights coordinate to guarantee schema validity and low latency. (machinelearning.apple.com)
- Safety guard-rails – built-in content filters and prompt-/instruction-separation reduce prompt injection risk. (developer.apple.com)
Prerequisites: Xcode 26 beta, the iOS 26 SDK (or other latest platform SDK), and a device or simulator that supports Apple Intelligence. (apple.com)
Enable the “Foundation Models” entitlement in your project, import the module, and you are ready to prompt.
import FoundationModels
let session = LanguageModelSession()
let title = try await session.respond(
to: "Suggest a catchy title for a summer trip to Tokyo"
).content
print(title)
Runs entirely offline; average latency ≈ 200 ms on A18-class silicon. (developer.apple.com)
@Generable
struct SearchSuggestions {
@Guide(description: "Up to four related search terms", .count(4))
var searchTerms: [String]
}
let suggestions = try await session.respond(
to: "Give me travel-app search terms for Tokyo",
generating: SearchSuggestions.self
)
print(suggestions.searchTerms) // ["Tokyo itinerary", "Shibuya food", ...]
No JSON parsing required—the framework returns a fully typed Swift struct. (developer.apple.com)
import CoreLocation, WeatherKit, FoundationModels
struct GetWeather: Tool {
let name = "getWeather"
let description = "Return current temperature for a city"
@Generable
struct Args { @Guide var city: String }
func call(arguments: Args) async throws -> ToolOutput {
let place = try await CLGeocoder().geocodeAddressString(arguments.city).first!
let w = try await WeatherService.shared.weather(for: place.location!)
return ToolOutput("\(arguments.city): \(w.currentWeather.temperature.value)°C")
}
}
let session = LanguageModelSession(tools: [GetWeather()])
let reply = try await session.respond(to: "How warm is it in Kyoto?")
print(reply.content) // "Kyoto: 27°C"
The model autonomously decides when to invoke getWeather
; you simply write the tool once. (developer.apple.com)
- The 3 B model is 2-bit quantized and shares KV-cache across blocks, cutting memory by ≈ 38 %. (developer.apple.com, machinelearning.apple.com)
- Intended for summaries, extraction, reasoning over user context; not a general-knowledge chatbot—delegate heavy tasks to PCC or external APIs. (developer.apple.com)
- Check
.availability
at runtime; older devices will return.unavailable(reason)
. (developer.apple.com)
- WWDC 25 “Meet the Foundation Models framework” session video & code. (developer.apple.com)
- Apple Newsroom press releases – Foundation Models access & tool updates. (apple.com, apple.com)
- MachineLearning.apple.com technical overview – architecture, quantization, evaluations. (machinelearning.apple.com)
- Developer docs landing page (
/documentation/foundationmodels
). (developer.apple.com) - TechCrunch hands-on recap for independent context. (techcrunch.com)
These sources walk through advanced topics such as adapter training, safety evaluation, and Instruments profiling so you can ship robust on-device AI experiences.