TiC-LM: A Net-Scale Benchmark for Time-Continuous LLM Pretraining

This paper was accepted to the ACL 2025 foremost convention as an oral presentation.

This paper was accepted on the Scalable Continuous Studying for Lifelong Basis Fashions (SCLLFM) Workshop at NeurIPS 2024.

Giant Language Fashions (LLMs) educated on historic internet knowledge inevitably turn into outdated. We examine analysis methods and replace strategies for LLMs as new knowledge turns into out there. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Frequent Crawl (CC) – orders of magnitude bigger than earlier continuous language modeling benchmarks. We additionally design time-stratified evaluations throughout each normal CC knowledge and particular domains (Wikipedia, StackExchange, and code documentation) to evaluate how nicely varied continuous studying strategies adapt to new knowledge whereas retaining previous information. Our findings exhibit that, on normal CC knowledge, autoregressive meta-schedules mixed with a fixed-ratio replay of older knowledge can obtain comparable held-out loss to re-training from scratch, whereas requiring considerably much less computation (2.6x). Nonetheless, the optimum stability between incorporating new knowledge and replaying outdated knowledge differs as replay is essential to keep away from forgetting on generic internet knowledge however much less so on particular domains.

* Work performed throughout an internship at Apple
° Work performed whereas at Apple
† Equal contribution
‡ Challenge lead
§ College of Washington

Main Menu

What's Hot

Hackers Breach Toptal GitHub, Publish 10 Malicious npm Packages With 5,000 Downloads

You must flip off this default TV setting ASAP – and why even consultants advocate it

Prime Abilities Information Scientists Ought to Study in 2025

TiC-LM: A Net-Scale Benchmark for Time-Continuous LLM Pretraining

Prime Abilities Information Scientists Ought to Study in 2025

mRAKL: Multilingual Retrieval-Augmented Information Graph Building for Low-Resourced Languages

How Uber Makes use of ML for Demand Prediction?

Hackers Breach Toptal GitHub, Publish 10 Malicious npm Packages With 5,000 Downloads

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Hackers Breach Toptal GitHub, Publish 10 Malicious npm Packages With 5,000 Downloads

You must flip off this default TV setting ASAP – and why even consultants advocate it

Prime Abilities Information Scientists Ought to Study in 2025

Apera AI closes Sequence A financing, updates imaginative and prescient software program, names executives

Main Menu

Subscribe to Updates

What's Hot

TiC-LM: A Net-Scale Benchmark for Time-Continuous LLM Pretraining

Related Posts