We introduce two multilingual, multimodal basis language fashions that energy Apple Intelligence options throughout Apple units and providers: (i) a ∼3B-parameter on-device mannequin optimized for Apple silicon by architectural improvements equivalent to KV-cache sharing and 2-bit quantization-aware coaching; and (ii) a scalable server mannequin constructed on a novel Parallel-Monitor Combination-of-Specialists (PT-MoE) transformer that mixes observe parallelism, mixture-of-experts sparse computation, and interleaved international–native consideration to ship prime quality with aggressive value on Apple’s Non-public Cloud Compute platform. Each fashions are skilled on large-scale multilingual and multimodal datasets sourced through accountable net crawling, licensed corpora, and high-quality artificial knowledge, then additional refined with supervised fine-tuning and reinforcement studying on a brand new asynchronous platform. The ensuing fashions help a number of further languages whereas understanding pictures and executing device calls. In public benchmarks and human evaluations, each the server mannequin and the on-device mannequin match or surpass comparably sized open baselines.
A brand new Swift-centric Basis Fashions framework exposes guided technology, constrained device calling, and LoRA adapter fine-tuning, permitting builders to combine these capabilities with just a few strains of code. The newest developments in Apple Intelligence fashions are grounded in our Accountable AI method with safeguards like content material filtering and locale-specific analysis, in addition to our dedication to defending our customers’ privateness with improvements like Non-public Cloud Compute.
This paper gives technical particulars for Updates to Apple’s On-Machine and Server Basis Language Fashions, launched on June 9, 2025, on this publish.