Voozh

State Space Models offer linear-time sequence modeling with content-aware selective filtering, challenging Transformers for long-context inference.

Why This Matters

State Space Models (SSMs) provide a principled alternative to Transformers for long-sequence modeling. In production systems handling long contexts (e.g., code generation, genomic analysis), Transformer attention's quadratic cost becomes a bottleneck. Mamba achieves linear-time inference with constant-memory state, making it viable for million-token contexts where attention-based models are prohibitively expensive.

Core Idea

SSMs originate from continuous-time control theory: a latent state evolves over time driven by input, and observations are linear projections of that state. Mamba's key innovation is making the SSM parameters input-selective — the model learns to gate which information enters and exits the state, mimicking attention's ability to focus on relevant tokens without the cost.

Technical Details

The continuous-time SSM is defined as:

where is latent state, is input, and , , . Using zero-order hold discretization with step :

The recurrent update becomes:

Mamba's selective mechanism makes , , and input-dependent:

The parallel scan algorithm computes this recurrence in during training. Inference is O(1) per token with fixed state size N , yielding constant-memory decoding regardless of sequence length.

How It Works

Project input: Map token to expanded dimension .
Generate selective parameters: Compute input-dependent , , from .
Discretize: Convert continuous to discrete using .
Recurrent scan: Apply parallel scan (training) or sequential update (inference) to compute hidden states .
Output projection: Compute , then project through gating (SiLU) to output dimension.

Key Insights

Selectivity is essential: Non-selective SSMs (S4) cannot do in-context retrieval; making input-dependent enables content-aware filtering.
Diagonal + low-rank structure on enables recurrence; Mamba uses diagonal matrices exclusively.
Hardware-aware design: The scan kernel is IO-bound, not compute-bound — Mamba's CUDA kernel fuses discretization, scan, and output projection to minimize memory reads.
Linear decoding cost: Unlike KV-cache which grows linearly, SSM state is fixed-size , making generation memory-constant.

Sources

Gu, A. & Dao, T. "Mamba: Linear-Time Sequence Modeling with Selective State Spaces." arXiv:2312.00752 (2023). https://arxiv.org/abs/2312.00752
Gu, A. & Dao, T. "Mamba-3: Improved Sequence Modeling using State Space Principles." arXiv:2603.15569 (2026). https://arxiv.org/abs/2603.15569

URL: https://dev.to/sirajuddin-shaik/mambassm-basics-ndh

⇱ Mamba/SSM Basics - DEV Community