This paper was accepted on the Workshop on Latent & Implicit Pondering – Going Past CoT Reasoning 2026 at ICLR.
Autoregressive language fashions educated with next-token prediction generate textual content by sampling one discrete token at a time. Though very scalable, this goal forces the mannequin to commit at each step, stopping it from exploring or reflecting upon a number of believable continuations. Moreover, the compute allocation throughout tokens is uniform; each token is shaped primarily based on a single forward-pass, probably limiting the mannequin’s expressiveness in instances the place tough tokens require inherently extra compute. In direction of addressing these limitations, we introduce latent lookahead, a coaching technique that allows fashions to “suppose” earlier than producing: at chosen positions within the sequence, earlier than committing to the following token, the mannequin performs a multi-step lookahead in latent area. Extra exactly, as a substitute of sampling future tokens, we leverage the community’s latent area by recursively feeding its hidden states again into the context for τ steps, investing extra compute on predicting that token. This produces τ latent predictions which might be supervised in opposition to the following τ ground-truth tokens, encouraging the mannequin to “lookahead” and refine its prediction. We present that latent lookahead considerably outperforms each autoregressive and non-autoregressive baselines on planning duties similar to maze fixing, Sudoku, and ProsQA, the place foresight is important.
- ** Work carried out whereas at Apple

