Back to stories
Models

Mistral's New Open-Source TTS Model Beats ElevenLabs — and Fits on a Smartwatch

Michael Ouroumis3 min read
Mistral's New Open-Source TTS Model Beats ElevenLabs — and Fits on a Smartwatch

Mistral has released Voxtral, an open-source text-to-speech model that beats ElevenLabs in native-speaker blind tests and is compact enough to run on a smartwatch. The release landed on the same day Cohere launched Transcribe, an open-source speech-to-text model that hit the top of HuggingFace's leaderboard.

In a single day, the open-source community produced credible challengers to the leading proprietary models on both ends of the voice AI stack.

Voxtral's Performance

The headline number: in blind tests with native speakers, Voxtral was preferred over ElevenLabs 63% of the time on standard voices and approximately 70% of the time on custom voices. Those are significant margins. ElevenLabs has been the benchmark for commercial-quality AI voice generation, and Mistral's model is beating it.

The size achievement is equally notable. Running a high-quality TTS model on a smartwatch would have seemed implausible a year ago — voice generation is typically compute-intensive. Mistral's compression and efficiency work has pushed Voxtral into genuinely edge-deployable territory.

That combination — better than the leading commercial product, runs locally on constrained hardware — describes exactly the kind of open-source capability jump that disrupts market dynamics. Companies and developers building voice applications can now deploy a model that sounds better than the dominant commercial alternative, for free, with no API costs and no data leaving the device.

Cohere Transcribe: The Other Direction

Cohere's Transcribe took the top spot on HuggingFace's speech-to-text leaderboard on release day. While Mistral addressed voice generation (text-to-speech), Cohere addressed voice recognition (speech-to-text) — together, the two releases cover the full voice interface stack.

HuggingFace leaderboard position on launch day doesn't always reflect sustained performance as the community does more thorough testing, but first-day #1 rankings for both a Mistral and Cohere model on the same day is a meaningful signal about where open-source voice capabilities have arrived.

The Voice Layer Heats Up

These releases are part of a broader pattern accelerating this week. Sanas crossed $60 million in annual recurring revenue with its real-time translation product across 13 languages. Google launched Gemini 3.1 Flash Live, its highest-quality voice model, powering a global rollout of Search Live. Apple is opening Siri to rival AI assistants via a new Extensions framework in iOS 27.

Voice is no longer a secondary feature of AI platforms. It's becoming the primary interface for a significant portion of AI interactions — in cars, on wearables, through smart speakers, and increasingly through the phone's native assistant layer.

The open-source advancement matters because voice AI has historically been more proprietary than text generation. The large model labs have dominated voice with products like ElevenLabs, Eleven's Speech-to-Speech, and OpenAI's voice modes. Voxtral and Transcribe represent the moment when open-source voice caught up — or, in Voxtral's case, appears to have surpassed — the best proprietary offerings.

What This Means for Developers

For anyone building a voice-enabled application, today's releases are a straightforward upgrade path. Voxtral delivers ElevenLabs-beating quality without per-character API costs. Transcribe provides top-of-leaderboard speech recognition without cloud dependency.

The edge deployment story — Voxtral fitting on a smartwatch — opens markets that were previously inaccessible. Offline voice applications, privacy-first voice interfaces, embedded hardware with no cloud connectivity: all of these become significantly more viable with a TTS model that matches commercial quality while running locally.

The year of voice AI started months ago. Today it got a lot more open.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read