The Phi-4 household is Microsoft’s newest development in small language fashions (SLMs), designed to excel in advanced reasoning duties whereas sustaining effectivity. The Phi-4 sequence contains three key fashions: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. The newly launched fashions are constructed with a transparent focus: ship superior reasoning efficiency with out the infrastructure calls for of trillion-parameter fashions. They strike an optimum steadiness between dimension and efficiency utilizing superior strategies akin to distillation, reinforcement studying, and thoroughly curated information.
Phi-4-reasoning is a 14-billion parameter mannequin with a 32k token context window, educated utilizing high-quality internet information and OpenAI o3-mini prompts. It excels in duties requiring detailed, multi-step reasoning akin to arithmetic, coding, and algorithmic drawback fixing.
Phi-4-reasoning-plus builds upon this with extra fine-tuning utilizing 1.5x extra tokens and reinforcement studying, delivering even greater accuracy and inference-time efficiency.
Phi-4-mini-reasoning, with simply 3.8 billion parameters, was educated on a million artificial math issues generated by DeepSeek R1. It targets use circumstances like academic instruments and cellular apps, proving able to step-by-step drawback fixing in resource-constrained environments.
What units Phi-4 aside isn’t just effectivity, however sheer functionality. On benchmarks like HumanEval+ and MATH-500:
- Phi-4-reasoning-plus outperforms DeepSeek-R1 (671B parameters) on some duties, demonstrating that smarter coaching can beat brute pressure.
- It additionally rivals OpenAI’s o3-mini and exceeds DeepSeek-R1-Distill-Llama-70B on advanced reasoning and planning duties.
- Phi-4-mini-reasoning performs competitively with a lot bigger fashions and even tops some in math-specific benchmarks.
True to Microsoft’s Accountable AI framework, all Phi-4 fashions are educated with sturdy security protocols. Publish-training entails supervised fine-tuning (SFT), direct desire optimization (DPO), and reinforcement studying from human suggestions (RLHF). Microsoft makes use of public datasets centered on security, helpfulness, and equity – guaranteeing broad usability whereas minimizing dangers.
All three fashions are freely accessible by way of Hugging Face and Azure AI Foundry, permitting researchers, startups, and educators to combine high-performance reasoning into their very own functions.