Back to stories
Models

Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Michael Ouroumis2 min read
Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Meta on Wednesday unveiled Muse Spark, the first frontier AI model from its reorganized Superintelligence Labs, marking a dramatic strategic shift away from the open-weight approach that defined its Llama era. The model — code-named Avocado and built over nine months by a team led by chief AI officer Alexandr Wang — introduces a technique called "thought compression" that lets it rival top competitors while consuming significantly less compute.

Benchmark Performance

Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, a composite benchmark spanning reasoning, knowledge, mathematics, and coding. That places it behind GPT-5.4 and Gemini 3.1 Pro (both at 57) and Claude Opus 4.6 (53) in overall rankings, but the model excels in specific domains.

On CharXiv Reasoning for visual figure understanding, Muse Spark achieved 86.4, significantly outperforming Claude Opus 4.6 at 65.3 and GPT-5.4 at 82.8. On HealthBench Hard, it topped all rivals with a score of 42.8 percent. Its GPQA Diamond score of 89.5 for PhD-level reasoning surpassed Grok 4.2 but trailed Opus 4.6 and Gemini 3.1 Pro.

Thought Compression: Doing More With Less

The standout technical innovation is thought compression. During reinforcement learning, the model is penalized for excessive "thinking time," forcing it to solve complex problems with fewer reasoning tokens without sacrificing accuracy. The results are striking: Muse Spark used just 58 million output tokens to complete the Intelligence Index evaluation, compared to 157 million for Claude Opus 4.6 and 120 million for GPT-5.4.

According to Meta, Muse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick, a claim that could reshape how the industry thinks about scaling efficiency.

A Proprietary Pivot

Unlike Meta's previous Llama models, which anyone could download and modify under open-weight licenses, Muse Spark is proprietary. The model is rolling out immediately in the Meta AI app and Meta.ai website, with plans to expand across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses in the coming weeks. A limited API "private preview" will be offered to select partners.

The model accepts voice, text, and image inputs but produces text-only output. It features a fast mode for casual queries, multiple reasoning modes, and a new "shopping mode" that leverages Meta's creator ecosystem for commerce recommendations.

What It Means for the Industry

Muse Spark's thought compression technique could prove more influential than the model's raw benchmark scores. If confirmed at scale, the ability to achieve frontier-level reasoning at a fraction of the compute cost would pressure competitors to rethink their own scaling strategies.

The proprietary shift also raises questions about Meta's open-source commitments. With Muse Spark locked behind Meta's walled garden, the company that once positioned itself as AI's open-weight champion is now competing on the same closed terms as OpenAI and Google.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read