Skip to content

Instantly share code, notes, and snippets.

@koher
Created June 24, 2025 12:38
Show Gist options
  • Save koher/214301df47eeeb5c426cbcfd72700a8e to your computer and use it in GitHub Desktop.
Save koher/214301df47eeeb5c426cbcfd72700a8e to your computer and use it in GitHub Desktop.

Foundation Models

At WWDC 25 Apple opened up the on-device large-language model that powers Apple Intelligence to every iOS, iPadOS, macOS and visionOS app via a new “Foundation Models” framework. The model is a compact ~3 billion-parameter LLM that has been quantized to just 2 bits per weight, so it runs fast, offline and entirely on the user’s device, keeping data private while still handling tasks such as summarization, extraction, short-form generation and structured reasoning. (developer.apple.com, machinelearning.apple.com) Below is a developer-focused English-language overview—based mainly on Apple’s own announcements, docs and WWDC sessions—followed by ready-to-paste Swift code samples.

1. What Are Apple’s On-Device Foundation Models?

Apple ships two sibling LLMs: a device-scale model (~3 B params) embedded in Apple silicon and a server-scale mixture-of-experts model that runs inside Private Cloud Compute when more heft is required. (machinelearning.apple.com) The on-device variant is the one developers can call directly through the Foundation Models framework introduced in iOS 26, iPadOS 26, macOS Tahoe 26 and the latest visionOS/watchOS builds. (apple.com, apple.com)

  • Privacy-first – everything happens locally; nothing is sent to Apple unless you opt in to PCC. (apple.com)
  • Free & offline – no token metering or cloud costs; works without network. (apple.com, techcrunch.com)
  • Multilingual – the 2025 update added support for 15 languages with more coming. (machinelearning.apple.com)
  • Hardware scope – iPhone 15 Pro/16+, M-series iPads & Macs, and Apple Vision Pro. (apple.com)

2. Core Capabilities Developers Can Leverage

Capability How it’s enabled Typical use cases Key API
Guided Generation – schema-constrained output @Generable, @Guide macros JSON-safe summaries, feature lists, game content respond() / streamResponse() (developer.apple.com)
Snapshot Streaming Async snapshots of partially-generated structs Typing-ahead UIs, live document assembly for await partial in stream (developer.apple.com)
Tool Calling Tool protocol + guided arguments Weather lookup, database fetch, device actions Framework auto-invokes tools (developer.apple.com)
Stateful Sessions Contextful LanguageModelSession objects Multi-turn chat, threaded tasks session.transcript & isResponding (developer.apple.com)
Adapters Low-rank fine-tuning (rank 32) Domain tagging, sentiment, brand voice Python toolkit + .useCase API (machinelearning.apple.com)

Beyond general language tasks, Apple uses specialty adapters to power Live Translation, Genmoji, Visual Intelligence and Fitness features, demonstrating the model’s breadth. (apple.com)

3. Framework Architecture in a Nutshell

  1. Swift-native design – add import FoundationModels, then create a LanguageModelSession. Three lines of code give you an answer. (apple.com, apple.com)
  2. Vertical integration – the Swift macro system, OS daemon and quantized weights coordinate to guarantee schema validity and low latency. (machinelearning.apple.com)
  3. Safety guard-rails – built-in content filters and prompt-/instruction-separation reduce prompt injection risk. (developer.apple.com)

4. Getting Started

Prerequisites: Xcode 26 beta, the iOS 26 SDK (or other latest platform SDK), and a device or simulator that supports Apple Intelligence. (apple.com)

Enable the “Foundation Models” entitlement in your project, import the module, and you are ready to prompt.

5. Hands-On Code Examples

5.1 Minimal Prompt/Response

import FoundationModels

let session = LanguageModelSession()
let title = try await session.respond(
    to: "Suggest a catchy title for a summer trip to Tokyo"
).content
print(title)

Runs entirely offline; average latency ≈ 200 ms on A18-class silicon. (developer.apple.com)

5.2 Guided Generation (Structured Output)

@Generable
struct SearchSuggestions {
    @Guide(description: "Up to four related search terms", .count(4))
    var searchTerms: [String]
}

let suggestions = try await session.respond(
    to: "Give me travel-app search terms for Tokyo",
    generating: SearchSuggestions.self
)
print(suggestions.searchTerms)   // ["Tokyo itinerary", "Shibuya food", ...]

No JSON parsing required—the framework returns a fully typed Swift struct. (developer.apple.com)

5.3 Tool Calling (Weather “Hello World”)

import CoreLocation, WeatherKit, FoundationModels

struct GetWeather: Tool {
    let name = "getWeather"
    let description = "Return current temperature for a city"

    @Generable
    struct Args { @Guide var city: String }

    func call(arguments: Args) async throws -> ToolOutput {
        let place = try await CLGeocoder().geocodeAddressString(arguments.city).first!
        let w = try await WeatherService.shared.weather(for: place.location!)
        return ToolOutput("\(arguments.city): \(w.currentWeather.temperature.value)°C")
    }
}

let session = LanguageModelSession(tools: [GetWeather()])
let reply = try await session.respond(to: "How warm is it in Kyoto?")
print(reply.content)             // "Kyoto: 27°C"

The model autonomously decides when to invoke getWeather; you simply write the tool once. (developer.apple.com)

6. Performance, Privacy & Limitations

  • The 3 B model is 2-bit quantized and shares KV-cache across blocks, cutting memory by ≈ 38 %. (developer.apple.com, machinelearning.apple.com)
  • Intended for summaries, extraction, reasoning over user context; not a general-knowledge chatbot—delegate heavy tasks to PCC or external APIs. (developer.apple.com)
  • Check .availability at runtime; older devices will return .unavailable(reason). (developer.apple.com)

7. Further Official Resources

These sources walk through advanced topics such as adapter training, safety evaluation, and Instruments profiling so you can ship robust on-device AI experiences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment