Back to stories
Models

Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Michael Ouroumis2 min read
Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Google has announced Gemini 3.1 Pro, a major update to its flagship AI model that more than doubles reasoning performance compared to the previous generation. The model scored 77.1% on ARC-AGI-2, a benchmark specifically designed to test abstract reasoning capabilities.

The Numbers

The ARC-AGI-2 benchmark measures an AI model's ability to solve novel reasoning tasks that require genuine abstraction rather than pattern matching from training data. Gemini 3.1 Pro's 77.1% score represents a significant jump from Gemini 3 Pro's results, indicating real progress in the model's ability to reason about unfamiliar problems.

This performance places Gemini 3.1 Pro among the top-performing models on what many researchers consider the most challenging reasoning benchmark available today.

Where It's Available

Google is rolling out Gemini 3.1 Pro across its entire AI platform ecosystem:

What's Improved

Beyond the headline reasoning benchmark, Gemini 3.1 Pro shows improvements across several areas:

Mathematical Reasoning

The model handles multi-step mathematical proofs and calculations with greater reliability, reducing the error rate on complex derivations.

Code Understanding

Gemini 3.1 Pro demonstrates stronger ability to reason about code behavior, identify bugs through logical analysis, and suggest fixes that address root causes rather than symptoms.

Long-Context Reasoning

The model maintains coherent reasoning across longer contexts, making it more effective for tasks that require synthesizing information from large documents or codebases.

Implications for the Model Race

Google's announcement intensifies the competition among frontier AI labs. The focus on reasoning performance reflects a broader industry trend — raw language fluency is largely solved, and the differentiator is now how well models can think through complex, novel problems.

With OpenAI's GPT-5, Claude's legal reasoning dominance, and now Gemini 3.1 Pro all pushing hard on reasoning capabilities, the pace of improvement shows no signs of slowing down. For a detailed comparison of all three, see this ChatGPT vs Claude vs Gemini guide.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read