Back to stories
Research

Frontier AI Models Solve an Open Math Problem That Stumped Humans for Years

Michael Ouroumis3 min read
Frontier AI Models Solve an Open Math Problem That Stumped Humans for Years

AI systems have solved an open mathematics problem that had stumped human researchers since 2019.

Epoch AI reported that GPT-5.4 Pro became the first model to clear FrontierMath's open-problem track, solving a conjecture on Ramsey hypergraphs that the original authors had been unable to resolve. Gemini 3.1 Pro and Claude Opus 4.6 subsequently also solved it.

The distinction matters: these are not problems where the solution exists and AI found it faster. These are problems that were genuinely unsolved — by the humans who created them — when the models encountered them.

What Ramsey Hypergraphs Are

Ramsey theory is a branch of combinatorics concerned with conditions under which order must appear in structures that seem chaotic. Hypergraph problems in this space involve understanding how colors, connections, or patterns must emerge across high-dimensional graph structures once certain size or density thresholds are crossed.

The specific 2019 conjecture that AI solved involved predicting the existence or properties of certain Ramsey configurations in hypergraphs. The original authors could not find a proof. Neither could subsequent researchers. FrontierMath — a benchmark specifically designed to contain problems beyond current human solving capacity — had listed it as an open problem.

GPT-5.4 Pro produced a valid solution.

The IQ Trajectory That Makes This Less Surprising

Epoch AI's announcement landed in the same week that researcher Charbel-Raphael Ségerie published a striking data point: in March 2023, Claude had an estimated IQ equivalent of approximately 64 on standardized reasoning tests. Today, Claude Opus 4.6 scores 133 on the Mensa Norway test. GPT-5.2 Thinking scores 141. Gemini 3 Pro reaches 142.

That's a jump from cognitively impaired to gifted in approximately three years. No human population in recorded history has ever improved that fast on standardized cognitive assessments.

The Ramsey hypergraph result fits this trajectory. Models aren't just getting better at producing fluent text — they're getting better at mathematical reasoning, at decomposing novel problems into tractable subproblems, and at generating and verifying proofs. The same week that Claude proved it can do original theoretical physics research, another cluster of frontier models proved they can extend human mathematics.

What FrontierMath Is

FrontierMath is a benchmark developed specifically to stay ahead of AI capability. Standard math benchmarks like MATH and GSM8K were saturated — models were scoring at or near 100% — and stopped measuring meaningful differences between frontier systems.

FrontierMath collects problems from working mathematicians, many of which involve research-level difficulty or genuinely open questions. The open-problem track is its most extreme tier: problems listed there have no known human solution at the time they're added.

The fact that frontier models have now cleared this track doesn't mean AI has solved mathematics. It means the tier of problems that can serve as a meaningful test of frontier AI capability has moved again, further into territory that was previously considered uniquely human.

Implications

The practical significance of AI solving open math problems is still being worked out. Mathematical research doesn't produce immediate products, but it underpins fields ranging from cryptography and materials science to fundamental physics. A system that can advance mathematics could, in principle, accelerate progress across all of those areas.

More immediately, the result updates the timeline for when AI might be considered a genuine research collaborator in formal domains. The cautious view — that AI systems are good at pattern matching but can't do real mathematical reasoning — has become harder to hold. What happened with the Ramsey hypergraph conjecture is closer to genuine mathematical discovery than anything AI systems had previously demonstrated.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Research

Anthropic's Project Deal: 69 Employees, 186 AI-Brokered Trades, and a Quiet Warning About 'Agent Quality' Gaps
Research

Anthropic's Project Deal: 69 Employees, 186 AI-Brokered Trades, and a Quiet Warning About 'Agent Quality' Gaps

Anthropic let Claude agents handle real money on behalf of 69 staff in a closed marketplace. Opus 4.5 agents extracted measurably more value than Haiku 4.5 — and the people on the losing side never noticed.

3 days ago2 min read
Sony AI's Project Ace becomes first robot to beat elite table tennis players, lands Nature cover
Research

Sony AI's Project Ace becomes first robot to beat elite table tennis players, lands Nature cover

Sony AI's autonomous Project Ace robot defeated elite and professional table tennis players in real-world matches, marking the first time a machine has reached expert-level competitive play in a physical sport.

3 days ago3 min read
X Square Robot Unveils Wall-B Embodied AI Model, Promises Home Robots in 35 Days
Research

X Square Robot Unveils Wall-B Embodied AI Model, Promises Home Robots in 35 Days

Backed by Alibaba, ByteDance, Xiaomi and Meituan, X Square Robot debuted Wall-B, the first robot built on its World Unified Model architecture, with home deployments slated to begin within 35 days.

5 days ago2 min read