Scaling Diffusion Language Fashions by way of Adaptation from Autoregressive Fashions

Diffusion Language Fashions (DLMs) have emerged as a promising new paradigm for textual content generative modeling, doubtlessly addressing limitations of autoregressive (AR) fashions. Nonetheless, present DLMs have been studied at a smaller scale in comparison with their AR counterparts and lack truthful comparability on language modeling benchmarks. Moreover, coaching diffusion fashions from scratch at scale stays difficult. Given the prevalence of open-source AR language fashions, we suggest adapting these fashions to construct textual content diffusion fashions. We display connections between AR and diffusion modeling aims and introduce a easy continuous pre-training strategy for coaching diffusion fashions. Via systematic analysis on language modeling, reasoning, and commonsense benchmarks, we present that we will convert AR fashions starting from 127M to 7B parameters (GPT2 and LLaMA) into diffusion fashions DiffuGPT and DiffuLLaMA, utilizing lower than 200B tokens for coaching. Our experimental outcomes reveal that these fashions outperform earlier DLMs and are aggressive with their AR counterparts. We launch a collection of DLMs (127M-355M-7B) able to producing fluent textual content, performing in-context studying, filling within the center with out immediate re-ordering, and following directions.

† The College of Hong Kong
‡ College of Illinois at Urbana-Champaign
§ Tencent AI Lab

Main Menu

What's Hot

Microsoft Limits IE Mode in Edge After Chakra Zero-Day Exercise Detected

A Quarter of the CDC Is Gone

The #1 Podcast To Make You A Higher Chief In 2024

Scaling Diffusion Language Fashions by way of Adaptation from Autoregressive Fashions

Enlightenment – O’Reilly

EncQA: Benchmarking Imaginative and prescient-Language Fashions on Visible Encodings for Charts

Remodeling the bodily world with AI: the subsequent frontier in clever automation

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Microsoft Limits IE Mode in Edge After Chakra Zero-Day Exercise Detected

A Quarter of the CDC Is Gone

The #1 Podcast To Make You A Higher Chief In 2024

Enlightenment – O’Reilly

Main Menu

Subscribe to Updates

What's Hot

Scaling Diffusion Language Fashions by way of Adaptation from Autoregressive Fashions

Related Posts