LaCy: What Small Language Fashions Can and Ought to Study is Not Only a Query of Loss

This paper was accepted on the Workshop on Reminiscence for LLM-Based mostly Agentic Programs at ICLR.

Language fashions have persistently grown to compress extra world information into their parameters, however the information that may be pretrained into them is upper-bounded by their parameter dimension. Particularly the capability of Small Language Fashions (SLMs) is restricted, resulting in factually incorrect generations. This drawback is commonly mitigated by giving the SLM entry to an out of doors supply: the flexibility to question a bigger mannequin, paperwork, or a database. Beneath this setting, we research the basic query of which tokens an SLM can and may be taught throughout pretraining, versus which of them it ought to delegate by way of a token. We discover that this isn’t merely a query of loss: though the loss is predictive of whether or not a predicted token mismatches the ground-truth, some tokens are acceptable in that they’re truthful different continuations of a pretraining doc, and mustn’t set off a even when their loss is excessive. We discover {that a} spaCy grammar parser may also help increase the loss sign to resolve which tokens the SLM ought to be taught to delegate to forestall factual errors and that are secure to be taught and predict even underneath excessive losses. We suggest LaCy, a novel pretraining methodology primarily based on this token choice philosophy. Our experiments reveal that LaCy fashions efficiently be taught which tokens to foretell and the place to delegate for assist. This leads to greater FactScores when producing in a cascade with an even bigger mannequin and outperforms Rho or LLM-judge educated SLMs, whereas being less complicated and cheaper.

† College of Cambridge
** Work achieved whereas at Apple

Main Menu

What's Hot

7 Inquiries to Ask Any AI Information Vendor After a Provide-Chain Safety Incident

AWS Fixes Extreme RCE, Privilege Escalation Flaws in Analysis and Engineering Studio

All of the states Pornhub is blocked in as of April 2026

LaCy: What Small Language Fashions Can and Ought to Study is Not Only a Query of Loss

Understanding Amazon Bedrock mannequin lifecycle

Kaggle + Google’s Free 5-Day Gen AI Course

A Fingers-On Information to Testing Brokers with RAGAs and G-Eval

7 Inquiries to Ask Any AI Information Vendor After a Provide-Chain Safety Incident

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

7 Inquiries to Ask Any AI Information Vendor After a Provide-Chain Safety Incident

AWS Fixes Extreme RCE, Privilege Escalation Flaws in Analysis and Engineering Studio

All of the states Pornhub is blocked in as of April 2026

LaCy: What Small Language Fashions Can and Ought to Study is Not Only a Query of Loss

Main Menu

Subscribe to Updates

What's Hot

LaCy: What Small Language Fashions Can and Ought to Study is Not Only a Query of Loss

Related Posts