Back to stories
Models

OpenAI Launches GPT-5 Turbo — 3x Faster, Half the Cost

Michael Ouroumis2 min read
OpenAI Launches GPT-5 Turbo — 3x Faster, Half the Cost

OpenAI has released GPT-5 Turbo, a leaner version of its flagship model that trades a small amount of peak capability for dramatically better speed and pricing. The model is aimed squarely at production developers who need frontier-quality responses without the latency and cost of the full GPT-5.

Speed and Pricing

The numbers are straightforward. GPT-5 Turbo delivers responses three times faster than GPT-5, with a time-to-first-token of under 200 milliseconds. API pricing drops to $2.50 per million input tokens and $10 per million output tokens — half the cost of standard GPT-5.

OpenAI achieved this through model distillation, a technique where a smaller model is trained to replicate the behavior of the larger one. The company says GPT-5 Turbo uses roughly 40% fewer parameters than GPT-5 while retaining most of its capabilities.

"The full GPT-5 is our research flagship. GPT-5 Turbo is what you ship to production," said Sam Altman during the announcement livestream.

Benchmark Performance

On standard benchmarks, GPT-5 Turbo scores within 2-3% of the full GPT-5 across reasoning, coding, math, and general knowledge tasks. On MMLU-Pro, it scores 89.1% compared to GPT-5's 91.4%. On HumanEval coding benchmarks, it achieves 93.2% versus 95.1%.

Where the gap widens is on the most demanding multi-step reasoning problems. On complex mathematical proofs and extended agentic coding tasks, the full GPT-5 maintains a more noticeable edge. For the vast majority of production use cases — customer support, content generation, data extraction, code completion — the difference is negligible.

Context Window

GPT-5 Turbo ships with a 256,000-token context window, matching the full GPT-5. OpenAI says there is no degradation in long-context retrieval accuracy, which was a common complaint with earlier Turbo variants.

Developer Reaction

The developer community has responded positively. Many teams had been using GPT-5 in development but switching to cheaper models for production due to cost constraints. GPT-5 Turbo eliminates that trade-off.

"We were spending $40K a month on GPT-5 API calls," said a startup CTO on X. "GPT-5 Turbo cuts that in half with no visible quality drop. This is what we were waiting for."

Competitive Pressure

The release puts pressure on Anthropic and Google, both of which charge premium rates for their flagship models. Anthropic's Claude Opus is priced at $15/$75 per million tokens, while Google's Gemini 3.1 Ultra sits at $12/$60. GPT-5 Turbo undercuts both significantly while claiming comparable performance.

OpenAI also announced that GPT-5 Turbo will replace GPT-4o as the default model in ChatGPT Free within the next two weeks, giving hundreds of millions of users access to near-frontier performance at no cost.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read