Back to stories
Models

AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

Michael Ouroumis2 min read
AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

AI21 Labs has released Jamba 2, a 398-billion-parameter model that takes a fundamentally different approach to architecture by interleaving Mamba-style state space model (SSM) layers with traditional transformer attention layers. The result matches GPT-5 and Claude Sonnet 4.5 on major reasoning benchmarks while running inference at roughly one-fifth the cost.

How the Hybrid Architecture Works

Pure transformer models compute attention across all tokens in a sequence, creating quadratic scaling costs as context windows grow. Jamba 2 replaces a significant portion of these attention layers with SSM layers based on the Mamba architecture, which process sequences in linear time by maintaining a compressed state representation instead of attending to every previous token.

The attention layers that remain handle tasks where precise token-to-token relationships matter — retrieval, exact matching, and fine-grained reasoning. The SSM layers handle long-range dependency tracking, summarization, and general language modeling. AI21 reports that this division of labor is what makes the cost reduction possible without sacrificing quality.

Benchmark Results

On MMLU-Pro, HumanEval+, and MATH-500, the 398B Jamba 2 scores within striking distance of both GPT-5 and Claude Sonnet 4.5. Where the model pulls ahead is on long-document tasks. With a 256K context window and linear-time SSM layers handling the bulk of long-range processing, Jamba 2 outperforms all competitors on multi-document QA, long-form summarization, and needle-in-a-haystack retrieval at extreme context lengths.

AI21 claims the cost advantage compounds at longer contexts. At 256K tokens, Jamba 2 inference is roughly 8x cheaper than a comparable pure-transformer model because the SSM layers avoid the quadratic attention blowup entirely.

Three Model Sizes

The Jamba 2 family includes three tiers:

The open-weight Mini release gives developers and researchers access to the hybrid architecture for experimentation and fine-tuning, following the trend set by DeepSeek R2 and other recent open-weight releases.

Why It Matters

Jamba 2 is the strongest evidence yet that pure transformer architectures may not be the final answer. The hybrid SSM-transformer approach addresses the two biggest pain points in LLM deployment — inference cost and long-context performance — without requiring the kind of hardware breakthroughs that GPU manufacturers are racing to deliver. If these efficiency gains hold at scale, other labs will face pressure to adopt similar hybrid designs or explain why they are paying five times more for equivalent results.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read