Google's Four-Chip Plan to Own AI Inference at Scale

Google splits its next TPU generation across Broadcom, MediaTek, Marvell, and Intel to win inference economics, revealed ahead of Cloud Next 2026.

Google's Four-Chip Plan to Own AI Inference at Scale

Google's opening keynote at Cloud Next 2026 in Las Vegas starts today, and its chip strategy was already out the door before Thomas Kurian took the stage. Reports from The Information and The Next Web confirmed what chip watchers had been piecing together for weeks: Google isn't just building custom silicon to cut costs. It's building a four-partner supply chain where no single vendor can raise prices unilaterally, slow delivery, or hold its roadmap hostage.

Key Specs

ComponentPartnerPurposeTSMC NodeStatus
Ironwood (TPU v7)BroadcomTraining + Inference3nmShipping, 4.3M units in 2026
Sunfish (TPU v8ax)BroadcomTraining2nmLate 2027
Zebrafish (TPU v8x)MediaTekInference, 20-30% cheaper2nmLate 2027
MPU + Inference TPUMarvellMemory-path optimizationTBDTalks, ~2M unit pilot 2027
Xeon 6 + IPUIntelNetworking, general CPU-Active

The split is deliberate. Broadcom takes the high-performance training chip (Sunfish). MediaTek takes the cost-optimized inference variant (Zebrafish). Marvell is being brought in for a memory processing unit that shortens the data path between the TPU and its HBM3E memory, plus an additional inference-focused TPU. Intel handles networking and general-purpose compute. TSMC fabricates everything.

Ironwood (TPU v7)       192 GB HBM3E · 7.2 TB/s bandwidth
                        9,216-chip superpod · 42.5 FP8 exaflops
                        10x peak performance vs TPU v5p

TPU v8ax  "Sunfish"     Broadcom-designed · TSMC 2nm · training
TPU v8x   "Zebrafish"   MediaTek-designed · TSMC 2nm · inference
                        Target: 20-30% cheaper per query than Sunfish

Marvell MPU             Shortens TPU-to-memory data path
                        Reduces power and latency
                        ~2M units targeted for 2027 pilot (no contract yet)

Inside the TPU v8 Split

Sunfish: Broadcom Keeps Training

Broadcom handles the highest-value part of the next generation - Sunfish, the training-side chip targeting TSMC's 2nm process node for late 2027. The relationship is the most established on the list. Broadcom signed a long-term agreement on April 6 to supply TPUs and networking components through 2031. Mizuho estimates Broadcom's AI revenue from its Google and Anthropic relationships will reach $21 billion in 2026, rising to $42 billion in 2027.

For context on how intertwined those relationships have become, our earlier coverage of Broadcom's TPU deal with Anthropic covers the scale of compute Anthropic is committing to Google's silicon. That arrangement makes Broadcom both a design partner and an indirect beneficiary every time Claude runs on Ironwood.

Zebrafish: MediaTek Enters the Data Center

MediaTek is the unexpected addition. Known for mobile SoCs, the company is designing Zebrafish - an inference-optimized TPU v8 variant that's 20-30% cheaper to operate than its training-focused counterpart. Google runs billions of inference queries daily across Gemini, Search AI overviews, NotebookLM, and Cloud products. At that scale, 20% cost reduction per query changes the economics of the entire product line.

The price advantage comes from design choices, not a different manufacturing process. Zebrafish runs on the same TSMC 2nm node as Sunfish, but with fewer high-bandwidth memory stacks, lower power targets, and a die optimized for sustained inference throughput rather than peak training flops. Google assembles the final system itself, capturing the integration margin that would otherwise go to a full-system vendor.

Marvell: Attacking the Memory Bottleneck

Marvell's assignment is the most technically interesting. Raw compute isn't usually the binding constraint for large-model inference. Memory bandwidth is. The time it takes to move model weights from HBM into compute units sets the floor on token generation speed, and HBM3E's 7.2 terabytes per second is already near the practical limit for current architectures.

The Marvell MPU shortens the data path between the TPU and its memory stack, cutting latency and reducing power per memory access. This isn't a new chip architecture - it's a precision fix for the bottleneck that limits what even a 42.5-exaflop chip can do in practice.

Marvell's data center revenue reached a record $6.1 billion in its fiscal year ending February 2026, supplying ASICs for Amazon's Trainium, Microsoft's Maia, and Meta's data processing unit. The Google discussions would add a third major ASIC assignment. Pilot production of roughly 2 million units is planned for 2027. No contract has been signed yet.

Google's Ironwood TPU v7 chip, photographed at SC25 Google's current Ironwood TPU v7, displayed at SC25. The chip delivers 10x peak performance versus its predecessor and serves as the baseline the TPU v8 Sunfish and Zebrafish are designed to succeed. Source: servethehome.com

Intel: Networking and CPU

Intel's role is narrower but distinct. Google is deploying Intel Xeon 6 processors alongside custom Infrastructure Processing Units across its data centers, handling networking, storage offload, and general-purpose tasks that TPUs aren't designed for. Intel isn't competing with the custom ASIC partners - it fills a different layer of the stack, and Google's engagement here is largely about supply chain hygiene and workload routing flexibility.

Why Inference Became the Target

Training gets the headlines, but inference is where the compute actually runs. According to New Street Research, inference now accounts for roughly two-thirds of all AI compute cycles. Training happens once per model version. Inference happens billions of times a day.

Google Cloud Next 2026: agentic infrastructure layer Google's infrastructure messaging at Cloud Next 2026 centers on continuous agent execution - workloads running 24/7 that demand inference-optimized silicon. Source: siliconangle.com

Bloomberg Intelligence projects custom ASIC shipments from cloud providers growing at 44.6% CAGR through 2033, versus 16.1% for general-purpose GPU shipments. Google is projecting 4.3 million TPU shipments in 2026 alone, scaling to over 35 million by 2028. The custom ASIC market is expected to reach $118 billion by 2033.

Nvidia's GPU dominance holds strongest where CUDA compatibility and software ecosystem matter most - research clusters, mixed workloads, small organizations that can't justify custom silicon. For a hyperscaler running one workload at planetary scale, the calculus is different. Purpose-built inference silicon attacks Nvidia's economics on the specific use case Google knows best.

Meta reached a similar conclusion earlier this year, signing a multibillion-dollar deal to rent Ironwood TPUs through Google Cloud - one of the cleaner external validation signals for the current generation.

The Supply Chain Is Also a Negotiation Strategy

The four-partner structure isn't accidental redundancy. It's a deliberate architecture for pricing power. Google doesn't want Broadcom building up the same kind of pricing power over its chip budget that Nvidia has accumulated over the broader market. Splitting Sunfish to Broadcom and Zebrafish to MediaTek creates two competing price anchors for the next generation.

The scale of the Broadcom relationship shows why this matters. At $21 billion in estimated 2026 AI revenue from Google and Anthropic combined, Broadcom's pricing behavior on the next contract is a line item in Google's AI infrastructure P&L. Adding MediaTek as a production-scale alternative, and Marvell as a systems-level design partner, distributes that influence across vendors who have reason to compete.

Marvell's stock jumped 6.3% in pre-market trading on April 20, the day The Information reported the discussions. That's the market pricing in how valuable a Google ASIC assignment has become.

Where It Falls Short

The strategy looks clean on paper. The execution risk is real and concentrated in a few places.

Zebrafish depends on TSMC 2nm being available at volume in late 2027. The node is on track, but semiconductor schedules slip, and "on track" at TSMC means ideal conditions. Any delay rolls the inference cost advantages by at least a year. MediaTek also hasn't shipped a volume data center ASIC before. That's a meaningful first-order risk for a chip Google plans to deploy in the millions.

The Marvell talks are still talks. Google's interest doesn't guarantee production. The MPU design is technically demanding, the pilot scale of 2 million units is small compared to Google's 35-million-unit 2028 projection, and no contract exists yet. If the Marvell track doesn't close, Google's memory bandwidth advantage stays at what Ironwood's HBM3E delivers today - sizable, but not the optimization the MPU aims to add.

Finally, coordinating four design partners across different chips, nodes, and packaging strategies adds integration overhead that Ironwood didn't have. That's a discipline Google's hardware team is building as they go. Whether it's built fast enough for a coherent 2027 system deployment is the question Cloud Next 2026 won't answer today.


Sources:

Google's Four-Chip Plan to Own AI Inference at Scale
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.