Back to stories
Industry

Deloitte Warns AI's Inference Era Will Drive Even Higher Computing Demand

Michael Ouroumis2 min read
Deloitte Warns AI's Inference Era Will Drive Even Higher Computing Demand

The artificial intelligence industry's shift from model training to real-world deployment will not ease pressure on computing infrastructure — it will intensify it, according to a major new report from Deloitte published on March 18.

The consulting firm's 2026 Technology, Media & Telecommunications Predictions challenge a widely held assumption that the move toward AI inference would reduce the industry's appetite for computing power and data center capacity.

The Inference Paradox

Deloitte projects that inference — the process of running trained AI models to generate outputs — will account for roughly two-thirds of all AI computing by 2026, up from about one-third in 2023. But rather than easing infrastructure demands, this shift is creating what the firm describes as an inference paradox.

Two emerging techniques are driving the surge. Post-training scaling methods, which refine models after initial training, can consume approximately 30 times the computing resources needed to train the original model. Test-time scaling, where models perform additional computation during each query to improve response quality, can require more than 100 times the computing power of a basic inference task.

The Numbers

The financial implications are staggering. Deloitte estimates global spending on AI data centers will reach approximately $400 billion in 2026, with that figure potentially climbing to $1 trillion annually by 2028.

High-performance AI chips, which can cost more than $30,000 each, are expected to account for around $200 billion in spending this year alone, with the overall AI chip market projected to cross $400 billion by 2028.

The market for inference-optimized chips — a category that includes products from companies like Groq and custom silicon from cloud providers — will grow to over $50 billion in 2026. However, Deloitte stresses this will supplement rather than replace demand for high-end GPUs.

Where Inference Actually Happens

Contrary to expectations that AI inference would migrate to edge devices and consumer hardware, most inference workloads will continue running in data centers and on-premises enterprise servers. The power, memory, and latency requirements of advanced AI models make edge deployment impractical for the majority of enterprise use cases.

"The world likely needs all the data centres and enterprise on-premises AI factories that are currently being planned and all the electricity that these facilities will need," the report states.

What It Means

For investors and infrastructure planners, the message is clear: the AI computing buildout is far from peaking. Despite efficiency improvements in chip design and model architecture, demand for AI computing is growing four to five times each year and is expected to maintain that pace through 2030, with significant implications for global energy consumption and capital allocation.

Learn AI for Free — FreeAcademy.ai

Take "AI for Business: Practical Implementation" — a free course with certificate to master the skills behind this story.

More in Industry

Eli Lilly Bets $2.25B on Profluent's AI-Designed Gene Editors in Beyond-CRISPR Deal
Industry

Eli Lilly Bets $2.25B on Profluent's AI-Designed Gene Editors in Beyond-CRISPR Deal

Eli Lilly inked a research collaboration worth up to $2.25 billion with Bezos-backed AI biotech Profluent to develop custom site-specific recombinases — enzymes designed by generative models to perform large-scale DNA editing that current CRISPR tools cannot.

6 min ago2 min read
AWS Unveils Amazon Quick, Connect Agentic AI Suite, and Bedrock Managed Agents Powered by OpenAI
Industry

AWS Unveils Amazon Quick, Connect Agentic AI Suite, and Bedrock Managed Agents Powered by OpenAI

At its April 28 'What's Next with AWS' event, Amazon turned Connect into a four-product agentic AI family, debuted desktop assistant Amazon Quick, and previewed Bedrock Managed Agents running OpenAI's frontier models on AWS infrastructure.

3 hours ago2 min read
Anthropic Opens Sydney Office, Builds on Australian Government MOU as Hourmouzis Takes ANZ Helm
Industry

Anthropic Opens Sydney Office, Builds on Australian Government MOU as Hourmouzis Takes ANZ Helm

Anthropic officially opened its Sydney office this week, naming former Snowflake executive Theo Hourmouzis as General Manager for Australia and New Zealand and reinforcing an earlier-April memorandum of understanding with the Australian government on AI deployment.

4 hours ago3 min read