What is Meta's Muse Spark AI model?

Muse Spark is Meta's first proprietary frontier AI model, built by Meta Superintelligence Labs under Alexandr Wang. It accepts text, voice, and image inputs and features multiple reasoning modes including a shopping mode for commerce.

How does Muse Spark compare to GPT-5.4 and Claude Opus 4.6?

Muse Spark scores 52 on the Artificial Analysis Intelligence Index, trailing GPT-5.4 and Gemini 3.1 Pro (both 57) and Claude Opus 4.6 (53), but beats all rivals on visual reasoning and health benchmarks while using far fewer reasoning tokens.

Is Muse Spark open source like Meta's Llama models?

No. Unlike Meta's previous Llama models which were open weight, Muse Spark is proprietary and only available through Meta's own apps and a limited API preview for select partners.

Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Meta on Wednesday unveiled Muse Spark, the first frontier AI model from its reorganized Superintelligence Labs, marking a dramatic strategic shift away from the open-weight approach that defined its Llama era. The model — code-named Avocado and built over nine months by a team led by chief AI officer Alexandr Wang — introduces a technique called "thought compression" that lets it rival top competitors while consuming significantly less compute.

Benchmark Performance

Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, a composite benchmark spanning reasoning, knowledge, mathematics, and coding. That places it behind GPT-5.4 and Gemini 3.1 Pro (both at 57) and Claude Opus 4.6 (53) in overall rankings, but the model excels in specific domains.

On CharXiv Reasoning for visual figure understanding, Muse Spark achieved 86.4, significantly outperforming Claude Opus 4.6 at 65.3 and GPT-5.4 at 82.8. On HealthBench Hard, it topped all rivals with a score of 42.8 percent. Its GPQA Diamond score of 89.5 for PhD-level reasoning surpassed Grok 4.2 but trailed Opus 4.6 and Gemini 3.1 Pro.

Thought Compression: Doing More With Less

The standout technical innovation is thought compression. During reinforcement learning, the model is penalized for excessive "thinking time," forcing it to solve complex problems with fewer reasoning tokens without sacrificing accuracy. The results are striking: Muse Spark used just 58 million output tokens to complete the Intelligence Index evaluation, compared to 157 million for Claude Opus 4.6 and 120 million for GPT-5.4.

According to Meta, Muse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick, a claim that could reshape how the industry thinks about scaling efficiency.

A Proprietary Pivot

Unlike Meta's previous Llama models, which anyone could download and modify under open-weight licenses, Muse Spark is proprietary. The model is rolling out immediately in the Meta AI app and Meta.ai website, with plans to expand across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses in the coming weeks. A limited API "private preview" will be offered to select partners.

The model accepts voice, text, and image inputs but produces text-only output. It features a fast mode for casual queries, multiple reasoning modes, and a new "shopping mode" that leverages Meta's creator ecosystem for commerce recommendations.

What It Means for the Industry

Muse Spark's thought compression technique could prove more influential than the model's raw benchmark scores. If confirmed at scale, the ability to achieve frontier-level reasoning at a fraction of the compute cost would pressure competitors to rethink their own scaling strategies.

The proprietary shift also raises questions about Meta's open-source commitments. With Muse Spark locked behind Meta's walled garden, the company that once positioned itself as AI's open-weight champion is now competing on the same closed terms as OpenAI and Google.

Meta's Muse Spark Narrows Frontier Gap With Novel Thought Compression Technique

Benchmark Performance

Thought Compression: Doing More With Less

A Proprietary Pivot

What It Means for the Industry

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M