What is Nvidia's new inference chip?

Nvidia is developing a new processor specifically optimized for AI inference — running trained models rather than training them. It integrates Language Processing Unit (LPU) technology licensed from Groq for $20 billion.

When will Nvidia announce the inference chip?

Nvidia is expected to unveil the chip at its annual GTC developer conference in San Jose later in March 2026.

Why is Nvidia building an inference-specific chip?

While Nvidia dominates AI training with over 90% GPU market share, customers increasingly need efficient solutions for running models in production. Inference workloads now represent a larger share of AI compute spending than training.

Nvidia Is Building a Secret Inference Chip With Groq Tech — And OpenAI Is the First Customer

Nvidia has dominated AI training for years. Now it wants to own inference too — and it's using $20 billion worth of acquired technology to do it.

The Secret Chip

According to a report from SiliconANGLE, Nvidia is preparing to unveil a new inference-focused processor at its annual GTC developer conference in San Jose later this month. The chip integrates Language Processing Unit (LPU) architecture that Nvidia licensed from Groq Inc. in December for $20 billion, along with hiring Groq's founding CEO Jonathan Ross and President Sunny Madra.

Groq's LPU architecture takes a fundamentally different approach to inference. Instead of repurposing GPUs designed for training, LPUs are built from the ground up to decode language model outputs with dramatically lower latency and energy consumption.

OpenAI Signs On First

The biggest signal of the chip's potential: OpenAI has already committed as the lead customer. The deal includes a massive purchase of dedicated inference capacity, backed by a $30 billion investment from Nvidia into OpenAI's infrastructure. That's not a research partnership — it's a production-scale commitment.

For OpenAI, which runs ChatGPT for over 900 million users, inference costs dwarf training costs. A chip purpose-built for fast, efficient model serving could meaningfully change the economics of running frontier models at consumer scale.

Why Inference Matters Now

The AI industry has reached an inflection point. Training the biggest models still requires enormous GPU clusters, but the real cost center has shifted. Every ChatGPT response, every Copilot suggestion, every Claude conversation is an inference workload. Companies are spending more on running models than building them.

Nvidia currently controls over 90% of the GPU market for AI training, but inference is more competitive. AMD, Intel, AWS custom silicon, and startups like Cerebras are all targeting the inference market. The Groq acquisition gives Nvidia a purpose-built architecture rather than just optimizing existing GPUs.

What to Watch at GTC

GTC 2026 runs later this month and is expected to be Nvidia's biggest product launch since the Blackwell architecture. Beyond the inference chip, CEO Jensen Huang is expected to detail the full Rubin platform roadmap and new software tools for agentic AI workloads.

The inference chip could reshape how AI companies budget their compute. If it delivers on the efficiency promises of Groq's LPU architecture, running frontier models just got a lot cheaper.

Nvidia Is Building a Secret Inference Chip With Groq Tech — And OpenAI Is the First Customer

The Secret Chip

OpenAI Signs On First

Why Inference Matters Now

What to Watch at GTC

More in Industry

Eli Lilly Bets $2.25B on Profluent's AI-Designed Gene Editors in Beyond-CRISPR Deal

AWS Unveils Amazon Quick, Connect Agentic AI Suite, and Bedrock Managed Agents Powered by OpenAI

Anthropic Opens Sydney Office, Builds on Australian Government MOU as Hourmouzis Takes ANZ Helm