OpenAI Ships Jalapeño - Its First Custom AI Chip

Broadcom CEO Hock Tan walked into OpenAI's offices on June 24 and handed Sam Altman a wafer. Not metaphorically. He carried a 300mm silicon wafer holding roughly 50 to 60 ASICs and placed it in the hands of OpenAI's CEO and President Greg Brockman. That wafer is Jalapeño - OpenAI's first custom inference chip, built on TSMC's 3nm process and designed from scratch around the specific math that makes large language models run.

TL;DR

Jalapeño is OpenAI's first custom ASIC, co-developed with Broadcom and manufactured at TSMC on the 3nm node
Systolic array architecture with eight HBM stacks; targets ~50% lower cost per inference token vs current GPU alternatives
Nine months from design to tape-out - fast for a high-performance ASIC
Engineering samples already running GPT-5.3 Codex workloads in-house
Microsoft is expected to buy 40% of initial production; prototype deployments planned for late 2026

The announcement caps a partnership with Broadcom that began in October 2025 and puts OpenAI in the same hardware vertical integration game as Google (TPUs), Amazon (Trainium), and Apple (Silicon). But unlike those moves, which grew from existing infrastructure businesses, Jalapeño is a first-generation chip from a company that has never shipped silicon before.

What Jalapeño Actually Is

Systolic Array Architecture

Jalapeño isn't a modified GPU. The design uses a systolic array - a grid of processing elements that pass data from cell to cell in rhythmic lockstep, well-suited to the dense matrix multiplications that dominate LLM inference. The architecture is similar in concept to Google's TPU family, which has used systolic designs since the first generation in 2016.

The key difference from GPU-based inference is data movement. GPUs are designed for general parallelism, which means memory reads and writes happen at high frequency and in patterns that don't always match LLM workload shapes. A purpose-built systolic array reduces those unnecessary data transfers and keeps compute units at higher utilization - which is where the efficiency gains come from.

OpenAI claims "substantially better performance per watt" than current alternatives and roughly 50% cost savings per inference token compared to today's GPU-based clusters. These are self-reported numbers from pre-production samples, and Broadcom says a detailed technical report with verified benchmarks will come later this year.

Eight HBM Stacks

Look closely at the packaged chip in the announcement photos and you'll see the ASIC die surrounded by eight stacks of high-bandwidth memory. This is the same class of HBM memory that Anthropic locked up in a multi-year deal with Micron earlier this week, and for the same reason: inference at scale is a memory bandwidth problem as much as a compute problem.

Stacking eight HBM modules directly on the package - rather than routing through system memory - cuts latency notably and keeps the processing elements fed. The tradeoff is cost and complexity at the packaging stage, which is Broadcom's domain.

Jalapeño Package (estimated layout):
+-----+-----+-----+-----+
| HBM | HBM | HBM | HBM |
+-----+-----+-----+-----+
|       ASIC  Die        |  <- systolic array core
+-----+-----+-----+-----+
| HBM | HBM | HBM | HBM |
+-----+-----+-----+-----+

Manufacturing: TSMC 3nm
Wafer yield:   ~50-60 ASICs per 300mm wafer
Integrator:    Celestica (boards, racks, networking)

The Nine-Month Sprint

From initial design to tape-out in nine months. Broadcom called it one of the fastest development cycles ever hit for a high-performance ASIC. Part of the speed came from a feedback loop that sounds almost circular: OpenAI's own models helped accelerate the design and optimization process.

The actual chip design work - RTL, verification, physical implementation - was done by Broadcom's silicon team. But OpenAI's engineers contributed the workload characterization, the kernel profiles, and the model-serving requirements that shaped the architecture from the start. The motto reportedly inscribed on the chip packaging: "May we scale smoothly, exponentially and uneventfully through AGI."

A silicon chip being manufactured, showing the dense interconnects and layered structure of advanced semiconductor production Advanced semiconductor manufacturing at TSMC's 3nm node requires precise layer-by-layer deposition across hundreds of steps. Source: pexels.com

The Full Stack, Layer by Layer

OpenAI: Architecture and Intent

OpenAI designed the chip around its deep understanding of how its models actually run at scale - the kernel profiles, the attention patterns, the serving system behaviors. This is the part that general-purpose hardware can't fully optimize for, because GPU vendors have to balance across many workload types. An ASIC designed for one company's specific model family can make choices a GPU vendor never could.

Greg Brockman framed it this way: "The world is moving to a compute-powered economy. Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses."

The "full stack" framing is deliberate. OpenAI designs the models, writes the kernels, runs the serving systems, and now designs the silicon those systems run on. The gap it hasn't closed is fabrication - that stays at TSMC.

Broadcom: Silicon and Networking

Broadcom handled chip implementation. That means RTL implementation, physical design, place-and-route, timing closure, and signoff - the detailed work that turns an architectural specification into masks ready for the fab. Broadcom also brings high-performance networking silicon, which matters at data center scale where thousands of chips need to communicate during inference on very large models.

Hock Tan described it as "a fundamental commitment to scaling the physical infrastructure required for the next decade of AI," and added that the goal is "deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026."

Broadcom's AI revenue doubled to $8.4 billion in Q1 2026 with Anthropic, OpenAI, and Meta all driving custom silicon demand. Jalapeño is the OpenAI chapter of a story Broadcom has been writing for years.

Celestica: Boards, Racks, and Integration

Celestica is the third piece - the contract manufacturer that takes chip samples and turns them into deployable hardware. That means boards, rack systems, thermal management, and production-scale manufacturing pipelines. A chip that works in a lab isn't a chip that ships in a data center. Celestica's role is to close that gap.

A large-scale data center with rows of server racks lit by status LEDs, representing the infrastructure that custom AI chips like Jalapeño are designed to power Jalapeño is designed for gigawatt-scale data centers built in partnership with Microsoft and other infrastructure partners. Source: pexels.com

The Microsoft Factor

Who	Role	Commitment
OpenAI	Design, architecture, model integration	First-party silicon strategy
Broadcom	Chip implementation, networking	Multi-generation platform partner
TSMC	Fabrication (3nm)	Manufacturing
Celestica	Board, rack, production integration	Volume manufacturing
Microsoft	Deployment, data center infrastructure	~40% of initial chip production

Microsoft is expected to purchase about 40% of Jalapeño's initial production run. That's not incidental - it means Jalapeño was designed with Azure's infrastructure requirements in mind, and Microsoft's data center team was likely involved in the rack and networking specifications from early in the process.

This also means the chip's economics are partially underwritten before it ships. OpenAI doesn't need to find buyers for an uncertain volume of first-generation silicon; Microsoft has already committed to absorbing a major portion of the output.

Where It Falls Short

Jalapeño is an inference chip. Training stays on Nvidia - and it'll for a long time. Pre-training runs require gradient computation, flexible parallelism strategies, and the ability to handle irregular compute patterns that ASICs handle poorly compared to GPUs. The systolic array design that makes Jalapeño efficient for inference is specifically bad at the kinds of exploratory, variable workloads that show up in training.

The 50% cost savings figure also needs context. It compares against "current-generation graphics processing units" - a phrase flexible enough to mean H100s, H200s, or B200s depending on what's most favorable. Independent verification hasn't happened yet. Broadcom says a detailed technical report is coming in the following months, but the numbers as stated are internal benchmarks from pre-production samples.

And the timeline is prototype deployments by end of 2026, with full production ramp in 2027 and 2028. First-generation custom silicon frequently encounters yield issues, thermal surprises, and software integration problems that push timelines. The fact that engineering samples are already running GPT-5.3 Codex workloads is a good sign, but samples and production are different things.

The chip delivers no benefit to external developers today. Users of the OpenAI API won't see a configuration option or a model tag indicating Jalapeño. If the chip reaches production, the performance gains are supposed to show up as faster responses and lower API prices - but there's no customer-facing interface to verify that.

OpenAI's Nvidia dependency doesn't disappear with Jalapeño. A single inference ASIC, even a good one, doesn't displace the GPU fleet you need for training, experimentation, and the workload types that don't fit a fixed architecture. What Jalapeño does is carve out the highest-volume, most predictable part of OpenAI's compute spend - live inference for ChatGPT and the API - and run it on hardware OpenAI controls the cost and roadmap for. That's a real change, even if it's gradual.

Sources: