How Apple’s Foundation Models Framework Changes the Economics of On-Device AI
Apple’s Foundation Models framework removes per-token API costs for a large class of AI features by giving developers free, offline privacy-preserving access to on-device inference. Instead of paying a cloud provider for every request, a request that would normally cost fractions of a cent to tens of dollars per million tokens, developers can run summarization, text generation, and similar tasks directly on the user’s device at no marginal cost.
That shifts AI spending from a recurring, usage-based bill to a one-time engineering investment.
This matters more now than it did at launch. Apple announced the framework at WWDC in June 2025, then significantly upgraded the underlying models a year later. Here’s what actually changed, what it costs to replicate with cloud APIs today, and where on-device AI still falls short.
What Apple’s Foundation Models Framework Provides Developers
The framework, introduced in June 2025, gave any app developer direct Swift access to Apple’s on-device language model, the same model powering Apple Intelligence features across iOS, iPadOS, and macOS. No API key, no billing account, no rate limits tied to usage volume.
In June 2026, Apple shipped a third generation of these models with a meaningfully different architecture. The on-device lineup now includes a 3-billion-parameter model (AFM 3 Core) and a more capable 20-billion-parameter model (AFM 3 Core Advanced) that activates only 1 to 4 billion parameters per request using a sparse routing design. This lets a larger, more capable model run within the memory limits of an iPhone, without loading the full parameter set into active memory for every query.
Server-side, Apple also introduced Private Cloud Compute models for tasks too demanding for on-device hardware, built in collaboration with Google and, notably, running partly on NVIDIA GPUs hosted in Google Cloud while maintaining the same privacy guarantees as on-device processing.
Current AI API Pricing Compared (Mid-2026)
To understand what free is actually worth, it helps to see what the alternative costs. AI API pricing has dropped sharply since 2025, but it is still a per-request, per-token expense that scales with usage.
| Provider / Model | Input ($/1M tokens) | Output ($/1M tokens) |
| OpenAI GPT-5 | $1.25 | $10.00 |
| OpenAI GPT-4o mini | $0.15 | $0.60 |
| Google Gemini 2.5 Flash-Lite | $0.075 | $0.30 |
| Anthropic Claude Sonnet 4.6 | $3.00 | $15.00 |
| DeepSeek V4 Flash | $0.14 | $0.28 |
These are mid-2026 rates, and they keep falling due to competitive pressure between providers. Even so, every one of these requests carries a marginal cost that scales directly with how many users open the feature and how often they use it. Apple’s on-device inference carries no equivalent per-request charge.
Cloud AI vs On-Device AI Costs at Different Usage Levels
Take a simple example: an app feature that summarizes a document, using roughly 500 input tokens and 150 output tokens per request.
| Scenario | Monthly Requests | Cloud Cost (GPT-5) | On-Device Cost |
| Small App | 100,000 | Around $95 | $0 |
| Mid-Size App | 5,000,000 | Around $4,750 | $0 |
| Large-Scale App | 100,000,000 | Around $95,000 | $0 |
The cloud cost figures use GPT-5’s blended per-token rate as a mid-range example; cheaper models like Gemini Flash-Lite or DeepSeek would lower these figures substantially, while premium models like Claude Sonnet would raise them. The pattern holds regardless of which provider is used: cost scales linearly with usage, while on-device inference stays flat because Apple, not the developer, absorbs the compute cost as part of the operating system.
Why On-Device Isn’t Always Cheaper in Practice
The zero marginal cost is real, but it comes with trade-offs that a pure cost comparison misses.
Apple’s on-device models are smaller than flagship cloud models by design, since they need to run within a phone’s memory and battery limits. AFM 3 Core Advanced activates at most 4 billion parameters per request, compared to frontier cloud models with far larger effective capacity. For tasks needing complex reasoning, long-context understanding, or highly specialized knowledge, on-device models can fall short of what a cloud API delivers, meaning some features still need a cloud fallback regardless of cost.
The framework is also Apple-only. An app with meaningful Android usage still needs a cloud-based or cross-platform AI solution for those users, so the on-device savings apply only to the Apple share of a product’s install base. Hardware requirements narrow this further, since the more capable AFM 3 Core Advanced model is limited to Apple’s newest silicon, so older but still active devices fall back to smaller models or need a cloud path entirely.
How Developers Are Already Using On-Device AI
The economics only matter if the underlying feature is genuinely useful, and this is where on-device AI has started showing up in ordinary daily tasks rather than just developer demos. Quick text rewrites, on-the-fly summaries, and offline-capable assistants are becoming standard in apps that would have previously needed a constant data connection and a cloud subscription to justify the AI feature at all, a shift covered in more detail in this breakdown of how on-device AI is changing everyday iPhone productivity.
How the Framework Fits Into Apple Intelligence
The Foundation Models framework doesn’t exist in isolation. It’s the developer-facing layer of Apple Intelligence, the same system powering writing tools, notification summaries, and visual search across Apple’s own apps. Readers who want to track how these Apple Intelligence developments evolve across iOS releases can follow the rollout as new capabilities, including the third-generation models covered here, reach supported devices.
How to Follow Apple’s iPhone, iPad, and Mac AI Updates
Apple’s approach to AI is still evolving quickly, with new model generations, framework updates, and hardware requirements shifting roughly every year. For readers who want ongoing coverage of these changes as they roll out across iPhone, iPad, and Mac, Apple news coverage from Apfelpatient includes daily bilingual updates on Apple’s hardware and software, including how new AI capabilities reach different devices over time.
Which App Features Are Best Suited for On-Device AI
Apple’s Foundation Models framework delivers the greatest value for features that users access frequently but don’t require frontier-level reasoning. Writing assistance, note summaries, content classification, offline search, and lightweight chat experiences are strong candidates because they eliminate recurring API costs while preserving user privacy. More complex workflows involving extensive reasoning, large context windows, or external knowledge retrieval are still better suited to cloud-hosted models, making a hybrid architecture the most practical approach for many modern apps.
Final Thoughts
Apple’s Foundation Models framework gives developers a genuine cost advantage for AI features that fit within an on-device model’s capabilities, since it eliminates per-token billing entirely for those use cases. It’s not a full replacement for cloud AI, since capability limits, platform exclusivity, and hardware requirements mean many apps will still run a hybrid approach. The clearest use case is high-volume, relatively simple AI features where cloud API costs would otherwise scale into thousands of dollars a month for no added capability the on-device model can’t already handle.
Apfelpatient, a bilingual German and English Apple news site, is one of the sources tracking how Apple’s on-device AI models and developer tools evolve as new iOS versions ship.
Artificial Intelligence – The Data Scientist
