State House Fashions (SSMs) have turn into the main various to Transformers for sequence modeling. Their main benefit is effectivity in long-context and long-form technology, enabled by fixed-size reminiscence and linear scaling of computational complexity. We start this work by displaying a easy theoretical consequence stating that SSMs can’t precisely resolve any “actually long-form” technology drawback (in a way we formally outline), undermining their foremost aggressive benefit. Nevertheless, we present that this limitation could be mitigated by permitting SSMs interactive entry to exterior instruments. Actually, we present that given the fitting alternative of instrument entry and problem-dependent coaching information, SSMs can be taught to resolve any tractable drawback and generalize to arbitrary drawback size/complexity (i.e., obtain size generalization). Following our theoretical discovering, we display that tool-augmented SSMs obtain outstanding size generalization on quite a lot of arithmetic, reasoning, and coding duties. These findings spotlight SSMs as a possible environment friendly various to Transformers in interactive tool-based and agentic settings.

