OpenAI and Broadcom unveiled Jalapeño on June 24, 2026 - OpenAI's first custom silicon, co-designed from scratch around the inference math of large language models. It went from initial design to manufacturing tape-out in nine months, a timeline OpenAI claims is the fastest ever publicly reported for a high-performance custom ASIC.

TL;DR

First custom AI chip from OpenAI, co-designed with Broadcom and manufactured at TSMC on the 3nm node
Reticle-sized ASIC (~840mm²) with systolic array architecture and eight HBM stacks, optimized purely for LLM decode throughput
Targets roughly 50% lower inference cost per token vs GPU baselines - unverified, self-reported pre-production claim
Inference only - training stays on NVIDIA; small deployments target end of 2026, full production ramp in 2027-2028

Overview

Jalapeño is not a general-purpose AI accelerator adapted from existing designs. OpenAI built it for one job: running production inference on its own models as cheaply as possible. The chip pairs a systolic array compute architecture - similar in philosophy to Google's early TPUs - with eight HBM memory stacks in a reticle-limited package. The design focuses on memory bandwidth over raw TFLOPS, addressing the decode phase bottleneck where data movement limits throughput more than compute capacity.

The development was led by Richard Ho, OpenAI's Head of Hardware, who contributed to Google's original TPU program before joining OpenAI. Broadcom handled silicon engineering and networking; TSMC manufactured on its 3nm node. Celestica will handle board and rack integration for deployment. The Ethernet-based interconnect uses Broadcom's own networking stack rather than NVIDIA's NVLink, keeping the supply chain completely outside NVIDIA's ecosystem.

What Jalapeño isn't: a product for sale. OpenAI has no announced plans to offer cloud access to the chip or license it to third parties. Jalapeño exists to cut OpenAI's own infrastructure costs. Whether that changes over a multi-generation roadmap remains unstated.

OpenAI CEO Sam Altman and Broadcom CEO Hock Tan hold the first Jalapeño wafer during the chip's June 24 announcement Broadcom CEO Hock Tan delivered a 300mm wafer holding roughly 50-60 Jalapeño ASICs to OpenAI CEO Sam Altman and President Greg Brockman on June 24, 2026. Source: the-decoder.com

Key Specifications

OpenAI and Broadcom disclosed architecture details but withheld most performance metrics. The table below uses confirmed figures where available and flags everything else.

Specification	Details
Manufacturer	OpenAI (design) / Broadcom (silicon) / TSMC (fab)
Product Family	Jalapeño Gen 1
Chip Type	ASIC
Process Node	TSMC 3nm
Die Size	~840mm² (at EUV reticle limit, ~858mm²)
Memory	8x HBM3 or HBM4 stacks
Memory Capacity	Not disclosed
Memory Bandwidth	Not disclosed
FP8 Performance	Not disclosed
FP16 Performance	Not disclosed
TDP	Not disclosed
Interconnect	Broadcom Ethernet
Systems Integration	Celestica
Target Workload	LLM Inference only
Release Date	Q4 2026 (prototype); 2027-2028 (production)

The die itself measures around 25.46mm × 33mm, filling the reticle almost completely. The package layout places one large compute chiplet at center, surrounded by six HBM memory stacks, with a separate I/O chiplet flanked by two structural dummy dies for mechanical balance - a layout analyzed from the 300mm production wafer shown at announcement.

Performance Benchmarks

No independent benchmarks exist yet. Jalapeño was in engineering sample phase at announcement, with OpenAI promising a full technical report in the months ahead. The only performance figure in circulation is the self-reported estimate of roughly 50% lower inference cost per token compared to current GPU-based alternatives - a claim Bloomberg reported from sources familiar with the chip, not official OpenAI documentation.

Metric	Jalapeño	NVIDIA H100 SXM	NVIDIA B200
Inference Cost/Token	~50% lower (claimed)	Baseline	~30% lower vs H100
Memory Capacity	Not disclosed	80GB HBM3	192GB HBM3e
Memory Bandwidth	Not disclosed	3.35 TB/s	8.0 TB/s
FP8 TFLOPS	Not disclosed	3,958	9,000
TDP	Not disclosed	700W	1,000W
Process Node	TSMC 3nm	TSMC 4NP	TSMC 4NP

The comparison is deliberately limited: without real numbers from OpenAI, any benchmark table would be fabrication. The NVIDIA H100 and NVIDIA B200 figures are real and confirmed. The Jalapeño column shows only what OpenAI has stated.

One meaningful comparison is with the Cerebras WSE-3, another inference-focused design that avoids DRAM completely by fitting 44GB of SRAM on a single wafer-scale chip. OpenAI's Codex-class models currently run on Cerebras hardware for low-latency inference; Jalapeño is intended as the long-term replacement for that dependency.

Key Capabilities

Systolic Array for Decode Throughput

The systolic array architecture passes data cell to cell in a fixed pipeline, which suits the regular matrix multiplications that dominate the decode phase of autoregressive inference. Unlike a GPU's SIMD model - which is flexible but carries significant control overhead - a systolic array can sustain near-theoretical compute use on inference workloads. Google's TPU line has proven this at scale. Jalapeño applies the same principle but built specifically around OpenAI's model architectures rather than a generic training and inference target.

The trade-off is inflexibility. A systolic array optimized for inference is poorly suited for training's irregular, variable compute patterns. OpenAI made an explicit architectural choice: inference efficiency above all else, with training staying on NVIDIA hardware for the foreseeable future.

Memory Bandwidth as the Real Constraint

The eight HBM stacks around the compute chiplet reflect a specific thesis about what limits inference speed. During the decode phase - producing each new output token - the bottleneck is not how fast the chip can multiply matrices, but how fast it can load model weights from memory into compute. A chip with more HBM stacks moving data faster can decode faster, regardless of raw TFLOPS. OpenAI's design explicitly targets this by maximizing bandwidth at the cost of compute density.

AI-Assisted Chip Design

OpenAI used its own language models during the nine-month development cycle to optimize circuit placements and timing paths. The self-referential loop - model designs chip, chip runs model, cheaper inference grows better model - is not incidental to Jalapeño. It was the design methodology. Whether AI-assisted EDA meaningfully compressed the timeline or whether the compressed timeline reflects OpenAI's willingness to accept more risk in tape-out is an open question the technical report should address.

Full-Stack Independence from NVIDIA

The Broadcom Ethernet interconnect isn't a spec detail - it's a strategic statement. NVIDIA's NVLink provides high-bandwidth scale-up networking between GPU nodes but locks buyers into NVIDIA's ecosystem. Jalapeño with Broadcom's networking stack scales across racks without requiring any NVIDIA component. For a company spending north of $10 billion annually on GPU infrastructure, reducing supplier leverage has direct bottom-line value.

Pricing and Availability

Jalapeño isn't available to buy or rent. OpenAI designed it exclusively for internal use and has no announced plans to commercialize access. The chip's commercial impact will show up in OpenAI's inference pricing - if the 50% cost reduction claim holds in production, it enables lower API prices or higher margins on the same output.

The rollout timeline has three phases: small prototype deployments in late 2026, mass production in 2027, and full operational scale in the first half of 2028. Those 2028 data centers will be built in partnership with Microsoft and other infrastructure partners under the Stargate program. The 10-gigawatt program OpenAI and Broadcom announced spans both 3nm and future 2nm chips, suggesting Jalapeño is the first in a planned annual or biennial cadence.

For teams assessing AI accelerators today, Jalapeño isn't an option - prototype volumes are going to OpenAI's own infrastructure, not third parties.

Strengths and Weaknesses

Strengths

Purpose-built inference architecture avoids the overhead of GPU general-purpose design, targeting higher hardware use
Eight HBM stacks directly address the memory bandwidth bottleneck that limits LLM decode throughput
Reticle-limited 3nm die maximizes on-chip compute density within current EUV limits
AI-assisted 9-month design cycle shows a new model for custom silicon development
Full Broadcom Ethernet stack removes NVIDIA NVLink dependency for scale-up networking
Multi-generation roadmap signals long-term commitment to custom silicon

Weaknesses

Zero independent benchmark data; all performance claims are self-reported and unverified
No commercial availability - teams can't assess or buy access, making competitive comparison academic until production deploys
Inference-only scope means OpenAI's training workloads remain on NVIDIA hardware, limiting leverage in negotiations
Full production deployment is 18+ months out, during which NVIDIA Blackwell and next-generation AMD chips will continue to improve
Architecture optimized for OpenAI's specific model shapes may not generalize well even if commercialized later

OpenAI Ships Jalapeño - Its First Custom AI Chip - initial news coverage of the June 24 announcement
NVIDIA H100 - the primary GPU Jalapeño targets for inference cost reduction
NVIDIA B200 - current-generation Blackwell flagship that Jalapeño will compete with in production
Cerebras WSE-3 - current inference platform running OpenAI Codex-class models; Jalapeño's intended long-term replacement

Sources: