Glossary term
Glossary term
Architecture
Neural sequence model using a continuous linear state-transition system rather than attention, achieving linear-time sequence modelling without a quadratic KV cache.
S4 (Gu et al. 2021, Stanford) was the first SSM to solve the Long Range Arena Path-X task requiring reasoning over 16,384 tokens, achieving 80.48% average accuracy vs under 60% for all Transformer baselines, establishing SSMs as a viable alternative architecture.
AI21 Labs' Jamba 1.5 Large uses a hybrid SSM plus attention architecture (43% Mamba-2, 7% attention, 50% MLP layers) and scored 65.4 on Arena Hard, outperforming Llama-3.1-70B and Llama-3.1-405B at reduced inference cost.
IBM Research collaborated with Mamba's authors to build Bamba and Bamba V2, whose hybrid SSM-attention architecture informed IBM Granite 4.0 - demonstrating enterprise adoption of SSM architectures in production foundation models.