koher/FoundationModels.md

## FoundationModels.md

      
    Raw
  

              FoundationModels.md
            
          
    Foundation Models

At WWDC 25 Apple opened up the on-device large-language model that powers Apple Intelligence to every iOS, iPadOS, macOS and visionOS app via a new “Foundation Models” framework. The model is a compact ~3 billion-parameter LLM that has been quantized to just 2 bits per weight, so it runs fast, offline and entirely on the user’s device, keeping data private while still handling tasks such as summarization, extraction, short-form generation and structured reasoning. (developer.apple.com, machinelearning.apple.com)
Below is a developer-focused English-language overview—based mainly on Apple’s own announcements, docs and WWDC sessions—followed by ready-to-paste Swift code samples.
1. What Are Apple’s On-Device Foundation Models?

Apple ships two sibling LLMs: a device-scale model (~3 B params) embedded in Apple silicon and a server-scale mixture-of-experts model that runs inside Private Cloud Compute when more heft is required. (machinelearning.apple.com) The on-device variant is the one developers can call directly through the Foundation Models framework introduced in iOS 26, iPadOS 26, macOS Tahoe 26 and the latest visionOS/watchOS builds. (apple.com, apple.com)

Privacy-first – everything happens locally; nothing is sent to Apple unless you opt in to PCC. (apple.com)
Free & offline – no token metering or cloud costs; works without network. (apple.com, techcrunch.com)
Multilingual – the 2025 update added support for 15 languages with more coming. (machinelearning.apple.com)
Hardware scope – iPhone 15 Pro/16+, M-series iPads & Macs, and Apple Vision Pro. (apple.com)

2. Core Capabilities Developers Can Leverage


Capability
How it’s enabled
Typical use cases
Key API


Guided Generation – schema-constrained output
@Generable, @Guide macros
JSON-safe summaries, feature lists, game content
respond() / streamResponse() (developer.apple.com)


Snapshot Streaming
Async snapshots of partially-generated structs
Typing-ahead UIs, live document assembly
for await partial in stream (developer.apple.com)


Tool Calling
Tool protocol + guided arguments
Weather lookup, database fetch, device actions
Framework auto-invokes tools (developer.apple.com)


Stateful Sessions
Contextful LanguageModelSession objects
Multi-turn chat, threaded tasks
session.transcript & isResponding (developer.apple.com)


Adapters
Low-rank fine-tuning (rank 32)
Domain tagging, sentiment, brand voice
Python toolkit + .useCase API (machinelearning.apple.com)


Beyond general language tasks, Apple uses specialty adapters to power Live Translation, Genmoji, Visual Intelligence and Fitness features, demonstrating the model’s breadth. (apple.com)
3. Framework Architecture in a Nutshell


Swift-native design – add import FoundationModels, then create a LanguageModelSession. Three lines of code give you an answer. (apple.com, apple.com)
Vertical integration – the Swift macro system, OS daemon and quantized weights coordinate to guarantee schema validity and low latency. (machinelearning.apple.com)
Safety guard-rails – built-in content filters and prompt-/instruction-separation reduce prompt injection risk. (developer.apple.com)

4. Getting Started


Prerequisites: Xcode 26 beta, the iOS 26 SDK (or other latest platform SDK), and a device or simulator that supports Apple Intelligence. (apple.com)

Enable the “Foundation Models” entitlement in your project, import the module, and you are ready to prompt.
5. Hands-On Code Examples

5.1 Minimal Prompt/Response

import FoundationModels

let session = LanguageModelSession()
let title = try await session.respond(
    to: "Suggest a catchy title for a summer trip to Tokyo"
).content
print(title)
Runs entirely offline; average latency ≈ 200 ms on A18-class silicon. (developer.apple.com)
5.2 Guided Generation (Structured Output)

@Generable
struct SearchSuggestions {
    @Guide(description: "Up to four related search terms", .count(4))
    var searchTerms: [String]
}

let suggestions = try await session.respond(
    to: "Give me travel-app search terms for Tokyo",
    generating: SearchSuggestions.self
)
print(suggestions.searchTerms)   // ["Tokyo itinerary", "Shibuya food", ...]
No JSON parsing required—the framework returns a fully typed Swift struct. (developer.apple.com)
5.3 Tool Calling (Weather “Hello World”)

import CoreLocation, WeatherKit, FoundationModels

struct GetWeather: Tool {
    let name = "getWeather"
    let description = "Return current temperature for a city"

    @Generable
    struct Args { @Guide var city: String }

    func call(arguments: Args) async throws -> ToolOutput {
        let place = try await CLGeocoder().geocodeAddressString(arguments.city).first!
        let w = try await WeatherService.shared.weather(for: place.location!)
        return ToolOutput("\(arguments.city): \(w.currentWeather.temperature.value)°C")
    }
}

let session = LanguageModelSession(tools: [GetWeather()])
let reply = try await session.respond(to: "How warm is it in Kyoto?")
print(reply.content)             // "Kyoto: 27°C"
The model autonomously decides when to invoke getWeather; you simply write the tool once. (developer.apple.com)
6. Performance, Privacy & Limitations


The 3 B model is 2-bit quantized and shares KV-cache across blocks, cutting memory by ≈ 38 %. (developer.apple.com, machinelearning.apple.com)
Intended for summaries, extraction, reasoning over user context; not a general-knowledge chatbot—delegate heavy tasks to PCC or external APIs. (developer.apple.com)
Check .availability at runtime; older devices will return .unavailable(reason). (developer.apple.com)

7. Further Official Resources


WWDC 25 “Meet the Foundation Models framework” session video & code. (developer.apple.com)
Apple Newsroom press releases – Foundation Models access & tool updates. (apple.com, apple.com)
MachineLearning.apple.com technical overview – architecture, quantization, evaluations. (machinelearning.apple.com)
Developer docs landing page (/documentation/foundationmodels). (developer.apple.com)
TechCrunch hands-on recap for independent context. (techcrunch.com)

These sources walk through advanced topics such as adapter training, safety evaluation, and Instruments profiling so you can ship robust on-device AI experiences.
Capability	How it’s enabled	Typical use cases	Key API
Guided Generation – schema-constrained output	`@Generable`, `@Guide` macros	JSON-safe summaries, feature lists, game content	`respond()` / `streamResponse()` (developer.apple.com)
Snapshot Streaming	Async snapshots of partially-generated structs	Typing-ahead UIs, live document assembly	`for await partial in stream` (developer.apple.com)
Tool Calling	`Tool` protocol + guided arguments	Weather lookup, database fetch, device actions	Framework auto-invokes tools (developer.apple.com)
Stateful Sessions	Contextful `LanguageModelSession` objects	Multi-turn chat, threaded tasks	`session.transcript` & `isResponding` (developer.apple.com)
Adapters	Low-rank fine-tuning (rank 32)	Domain tagging, sentiment, brand voice	Python toolkit + `.useCase` API (machinelearning.apple.com)