Back to stories
Industry

Google Splits Eighth-Gen TPUs Into Training and Inference Chips for Agentic Era

Michael Ouroumis3 min read
Google Splits Eighth-Gen TPUs Into Training and Inference Chips for Agentic Era

Google used the opening keynote at Cloud Next '26 on April 22 to declare the arrival of what it called the 'agentic era' of AI — and to unveil the silicon it believes will power it. The company announced its eighth-generation Tensor Processing Units, splitting the line into two purpose-built chips: TPU 8t for training frontier models and TPU 8i for serving them at scale. The message to Nvidia was unsubtle: Google now intends to compete head-on at both ends of the AI workload.

Two chips instead of one

For seven generations, Google's TPU roadmap produced a single accelerator that tried to balance training and inference. With TPU 8, the company abandoned that compromise. The training-focused TPU 8t scales to 9,600 chips in a single superpod with two petabytes of shared high-bandwidth memory and 121 ExaFlops of compute, delivering close to three times the per-pod performance of the previous Ironwood generation. Google also claims 97% 'goodput' — the share of time the chips spend doing useful work rather than stalling on failures or communication overhead.

The TPU 8i, tuned for inference, takes a different shape. Each chip carries 288 GB of high-bandwidth memory, 384 MB of on-chip SRAM (roughly triple the prior generation), and 19.2 Tb/s of interconnect bandwidth. Google positions it as offering about 80% better performance-per-dollar for inference workloads than Ironwood, and says the two chips together deliver up to 2x better performance-per-watt versus the previous generation.

Virgo Network and the fabric behind the chips

Alongside the accelerators, Google introduced a new data-center networking architecture it calls Virgo Network. According to reporting on the keynote, Virgo provides roughly a 4x increase in bandwidth per accelerator versus the previous generation and can link up to 134,000 TPU 8t chips through a non-blocking bi-sectional fabric of up to 47 petabits per second. That scale matters because agentic workloads — long-running reasoning chains, multi-tool pipelines, continuous background tasks — stress interconnect and memory bandwidth far more than classic chatbot traffic.

Implications: Nvidia, Anthropic, and the cost curve

The split architecture is Google's sharpest attempt yet to undercut Nvidia on the economics of inference, which is increasingly where AI dollars are actually spent. Google's cloud business has been riding a wave of TPU demand from Anthropic, which the company confirmed earlier this year would have access to up to one million TPU chips and more than a gigawatt of capacity in 2026. A cheaper, denser inference chip makes that commitment more defensible — and gives Google a clearer story to tell enterprise customers weighing Nvidia GPUs against custom silicon.

For developers, the takeaway is narrower but still meaningful: the chips that will run the next wave of autonomous agents, long-context assistants, and multi-step workflows are starting to look different from the ones that trained them. Google is betting that bifurcation — and the 'later in 2026' rollout window it set today — will be enough to keep pace as rivals push their own custom accelerators. Nvidia will not concede the inference market without a fight, but after today, it has a visibly sharper challenger.

Learn AI for Free — FreeAcademy.ai

Take "AI for Business: Practical Implementation" — a free course with certificate to master the skills behind this story.

More in Industry

Eli Lilly Bets $2.25B on Profluent's AI-Designed Gene Editors in Beyond-CRISPR Deal
Industry

Eli Lilly Bets $2.25B on Profluent's AI-Designed Gene Editors in Beyond-CRISPR Deal

Eli Lilly inked a research collaboration worth up to $2.25 billion with Bezos-backed AI biotech Profluent to develop custom site-specific recombinases — enzymes designed by generative models to perform large-scale DNA editing that current CRISPR tools cannot.

6 min ago2 min read
AWS Unveils Amazon Quick, Connect Agentic AI Suite, and Bedrock Managed Agents Powered by OpenAI
Industry

AWS Unveils Amazon Quick, Connect Agentic AI Suite, and Bedrock Managed Agents Powered by OpenAI

At its April 28 'What's Next with AWS' event, Amazon turned Connect into a four-product agentic AI family, debuted desktop assistant Amazon Quick, and previewed Bedrock Managed Agents running OpenAI's frontier models on AWS infrastructure.

3 hours ago2 min read
Anthropic Opens Sydney Office, Builds on Australian Government MOU as Hourmouzis Takes ANZ Helm
Industry

Anthropic Opens Sydney Office, Builds on Australian Government MOU as Hourmouzis Takes ANZ Helm

Anthropic officially opened its Sydney office this week, naming former Snowflake executive Theo Hourmouzis as General Manager for Australia and New Zealand and reinforcing an earlier-April memorandum of understanding with the Australian government on AI deployment.

4 hours ago3 min read