Autoregressive language fashions (ARMs) ship sturdy likelihoods, however are inherently serial: they generate one token per ahead cross, which limits throughput and inflates latency for lengthy sequences. Diffusion Language Fashions (DLMs) parallelize throughout positions and thus seem promising for language era, but normal discrete diffusion usually wants lots of to hundreds of mannequin evaluations to achieve top quality, buying and selling serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Movement-Matching. A discrete flow-matching mannequin designed for pace with out sacrificing high quality. The core thought is straightforward: make the variety of sampling steps an specific parameter and practice the mannequin to be constant throughout step budgets, so one huge transfer lands the place many small strikes would. We pair this with a dependable replace rule that strikes likelihood in the fitting course with out overshooting, and with sturdy trainer steering distilled from long-run trajectories. Collectively, these selections make few-step sampling secure, correct, and straightforward to regulate. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for producing 1,024 tokens utilizing a similar-size mannequin, delivering as much as 128 instances quicker sampling and corresponding latency/throughput features.
- † The Ohio State College
- ‡ Work executed whereas at Apple