Apple Intelligence introduces two multilingual, multimodal foundation language models that span Apple's devices and services. (i) ~3B parameter-on-device model optimized for Apple silicon through architectural innovations such as KV cache sharing and 2-bit quantization awareness training. (ii) A scalable server model built on Exper's Mixture (PT-MOE) transformer that combines track parallelism, sparse calculations of mixtures, and interleaved global local attention to deliver high quality at competitive cost on Apple's private cloud cloud complex platform. Both models are trained on large multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora and high quality synthetic data, and are further refined with supervised fine-tuning and reinforcement learning on new asynchronous platforms. The resulting model supports several additional languages while understanding the image and performing tool calls. Public benchmarks and human evaluations both match and on-device models or exceed open baselines of comparable sizes.
The new Swift-centric basic model framework exposes guided generation, constrained tool calls, and fine-tuning of the Lora adapter, allowing developers to integrate these features with several lines of code. The latest advancements in the Apple Intelligence model are based on a responsible AI approach with safeguards such as content filtering and locale-specific assessment.
This paper provides technical details about the updates to Apple's on-device and server foundation language models featured in this post on June 9, 2025.
