Revisiting the Scaling Properties of Downstream Metrics in Massive Language Mannequin Coaching

Whereas scaling legal guidelines for Massive Language Fashions (LLMs) historically concentrate on proxy metrics like pretraining loss, predicting downstream activity efficiency has been thought of unreliable. This paper challenges that view by proposing a direct framework to mannequin the scaling of benchmark efficiency from the coaching price range. We discover that for a set token-to-parameter ratio, a easy energy legislation can precisely describe the scaling conduct of log accuracy on a number of widespread downstream duties. Our outcomes present that the direct strategy extrapolates higher than the beforehand proposed two-stage process, which is susceptible to compounding errors. Moreover, we introduce purposeful types that predict accuracy throughout token-to-parameter ratios and account for inference compute below repeated sampling. We validate our findings on fashions with as much as 17B parameters skilled on as much as 350B tokens throughout two dataset mixtures. To help reproducibility and encourage future analysis, we launch the entire set of pretraining losses and downstream analysis outcomes.

** Work accomplished whereas at Apple

Main Menu

What's Hot

LlamaAgents Builder: From Immediate to Deployed AI Agent in Minutes

Enterprise Danger & Assault Floor

Intercom's new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer support resolutions

Revisiting the Scaling Properties of Downstream Metrics in Massive Language Mannequin Coaching

Run Generative AI inference with Amazon Bedrock in Asia Pacific (New Zealand)

Getting Began with Smolagents: Construct Your First Code Agent in 15 Minutes

Vector Databases Defined in 3 Ranges of Problem

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

LlamaAgents Builder: From Immediate to Deployed AI Agent in Minutes

Midjourney V7: Quicker, smarter, extra reasonable

LlamaAgents Builder: From Immediate to Deployed AI Agent in Minutes

Enterprise Danger & Assault Floor

Intercom's new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer support resolutions

Why accomplish that many mission pushed corporations have dangerous cultures?

Main Menu

Subscribe to Updates

What's Hot

Revisiting the Scaling Properties of Downstream Metrics in Massive Language Mannequin Coaching

Related Posts