Amazon Eyes Third-Party Trainium Sales to Rival Nvidia

Amazon CEO Andy Jassy hints the company will sell Trainium3 racks directly to outside data centers, citing a potential $50B revenue run rate and sold-out chip supply.

Amazon Eyes Third-Party Trainium Sales to Rival Nvidia

Amazon's Trainium3 chip is sold out. So is Trainium2. The fourth-generation chip, still roughly 18 months from release, already has substantial pre-orders committed. CEO Andy Jassy is now openly considering a move that would change how Amazon competes in the AI hardware market: selling racks of Trainium directly to outside data centers, bypassing AWS cloud entirely.

Peter DeSantis, Amazon's AI chief, confirmed he's in active talks with potential buyers. "There's so much underconsumption in AI," DeSantis said, framing third-party sales as additive rather than a threat to AWS cloud revenue. The company argues it won't cannibalize its own cloud business - customers who purchase Trainium racks still need networking, storage, orchestration, and everything else Amazon prices separately.

This would be the first time Amazon has seriously considered selling its custom silicon on the open market. For a decade, Trainium, Graviton, and Nitro chips existed purely as competitive advantages baked into EC2 pricing.

TL;DR

  • Amazon is in talks to sell Trainium3 racks directly to third-party data centers for the first time
  • CEO Jassy estimates a $50B+ annual revenue run rate if chips were sold as a standalone business, vs. ~$20B now
  • Trainium3 UltraServer: 144 chips, 362 FP8 PFLOPs (MXFP8) - vs Nvidia GB200 NVL72's 720 FP8 PFLOPs at $2-3M per rack
  • Neuron SDK ecosystem gaps and TSMC capacity limits are concrete constraints
  • Existing customers include Anthropic (Project Rainier: 1M+ chips), OpenAI, and Uber

What Trainium3 Actually Delivers

The Silicon

Trainium3 is Amazon's first 3nm AI accelerator. Each chip delivers 2.52 FP8 petaflops, 144GB of HBM3e memory, and 4.9TB/s of memory bandwidth. Compared to Trainium2, that's 4.4x more compute with four times the energy efficiency. The 40% reduction in energy consumption per compute unit matters much when buying racks at data center scale.

Amazon announced general availability of EC2 Trn3 UltraServers in December 2025. You can read the full AWS Trainium3 spec sheet on our hardware page. Both Trainium2 and Trainium3 are now fully absorbed by AWS demand - which is the precise reason third-party sales are under discussion.

The UltraServer Configuration

The Trn3 UltraServer combines 144 Trainium3 chips into a single unit delivering 362 FP8 PFLOPs total. For large training runs, EC2 UltraClusters 3.0 can connect thousands of these servers, scaling up to one million Trainium3 chips in a single logical cluster. Amazon positions this for "next-generation agentic, reasoning, and video generation" workloads - exactly the categories driving the current demand crunch.

The Neuron SDK is what runs models on Trainium. For teams coming from CUDA, the main friction is the compilation step. You trace a PyTorch model once to produce an optimized binary, then deploy from that artifact:

import torch
import torch_neuronx

model = MyModel()
model.eval()

# Compile once for Trainium - this takes 10-30 min for large models
example_input = torch.zeros([1, 512], dtype=torch.int64)
neuron_model = torch_neuronx.trace(model, example_input)
neuron_model.save("model_neuron.pt")

The compile step isn't trivial. Not every PyTorch operator has a Neuron equivalent, though coverage for standard transformer architectures is solid. Exotic attention variants and custom kernels are where teams run into walls.

A server rack at NERSC, the National Energy Research Scientific Computing Center Trainium3 UltraClusters link thousands of servers into a single logical cluster - scale that only makes commercial sense if Amazon can sell outside its own data centers. Source: commons.wikimedia.org

The Case for Going External

The $50B Math

Jassy laid out the economics in his annual shareholder letter in April 2026. Amazon's chip business - Trainium, Graviton, and Nitro combined - runs at over $20 billion in annualized revenue with triple-digit year-over-year growth. The quote that landed: "If we were a standalone chip company, our chips would be generating over $50 billion in annual revenue."

That 2.5x gap is the market Amazon isn't capturing. AWS monetizes Trainium indirectly through EC2 instance pricing rather than chip sales. Selling racks directly would capture margin that currently doesn't exist on the books.

Sovereign AI Demand Unlocks the Market

DeSantis flagged one driver that makes external sales easier to justify internally: sovereign AI. Governments and enterprises outside the US want compute they control locally, on hardware they own, in data centers they operate. AWS can't serve that demand at all - it requires selling physical silicon, not cloud credits.

France, Germany, Japan, and several Gulf states have active sovereign AI infrastructure programs with committed budgets. These aren't exploratory pilots; they're procurement cycles.

How It Compares to Nvidia

The rack-level comparison shows where Trainium3 wins and where it doesn't. Against the Nvidia GB200 NVL72, the current Blackwell rack-scale system, the raw throughput gap is real:

SpecTrainium3 UltraServerNvidia GB200 NVL72
AI chips per rack144 Trainium372 B200 GPUs
FP8 throughput362 PFLOPs (MXFP8)720 PFLOPs
Process node3nm (TSMC)4nm (TSMC)
Memory per chip144GB HBM3e192GB HBM3e
Memory bandwidth4.9TB/s per chip8TB/s per chip (B200)
Rack priceUndisclosed~$2-3M
EcosystemNeuron SDKCUDA (15+ years)

Nvidia's raw FP8 throughput advantage is roughly 2x at the rack level. Amazon's counterargument is total cost of ownership, not raw compute parity. Trainium3 already demonstrated a 30-50% cost advantage over Nvidia H100 and H200 hardware at scale for training workloads (from Uber's benchmarking). Against Blackwell specifically, Amazon hasn't published comparable TCO figures. For memory-intensive inference workloads, the lower per-chip memory of Trainium3 (144GB vs 192GB per chip for B200) means larger models require more aggressive tensor parallelism.

Who's Already Running It

Amazon's existing Trainium customer list includes the most heavily funded AI labs in the world. Anthropic's Project Rainier committed to deploying more than one million Trainium accelerators inside AWS - the largest single compute commitment any AI lab has publicly announced. OpenAI expanded its AWS relationship as part of a $100B+ cloud commitment.

Uber joined the roster in April 2026, beginning a pilot to train AI models on Trainium3 using its built up trip history of 13.567 billion trips. Kamran Zargahi, Uber's VP of Engineering, confirmed the deal, with Trainium3 running at roughly 30-50% the cost of comparable Nvidia H100 or H200 hardware at scale. That's not against Blackwell - it's against the previous Nvidia generation, which matters for the comparison.

A technician working on server rack infrastructure at NERSC Running AI training workloads at Uber's scale - 40+ million daily trips - requires infrastructure where compute cost differences compound fast. Source: commons.wikimedia.org

Where It Falls Short

Trainium's strongest argument is TCO. Its weakest is ecosystem depth.

CUDA has 15 years of libraries, profiling tools, community knowledge, and third-party software support. The Neuron SDK is improving fast, but it isn't CUDA. Many inference frameworks - including several popular open-source projects - either don't support Neuron natively or require porting work. For teams already invested in CUDA workflows, the switching cost is real even when the TCO math points toward Trainium. This is the same problem AMD has faced with ROCm for years, just with a better-funded vendor behind it.

The manufacturing constraint is also concrete. Expanding Trainium production for third-party sales means competing with Nvidia for 3nm wafer allocation at TSMC. Advanced nodes are heavily committed. DeSantis framed third-party sales as "quite possible" rather than confirmed - the TSMC bottleneck is one reason it stays speculative.

Trainium4, due roughly 18 months out, promises 6x the performance of Trainium3, native FP4 support, approximately 288GB of memory per chip, and 4x memory bandwidth. Pre-orders are already committed. If Amazon does open third-party sales before Trainium4 ships, customers get access to hardware that's already one generation behind the internal roadmap - a negotiating reality for any serious buyer.


Amazon's chip business grew at triple-digit year-over-year rates while staying completely invisible to the broader hardware market. The decision to sell externally isn't a question of whether the demand exists - Jassy's shareholder letter made that clear. The open question is whether TSMC can allocate enough wafer capacity to make it commercially viable at the scale Amazon would need to move the revenue needle.

Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.