Etched Exits Stealth With Working Chip and $1B in Orders

Transformer ASIC startup Etched comes out of stealth with first-pass silicon on TSMC N4P, $800M raised, and more than $1B in signed customer contracts.

Etched Exits Stealth With Working Chip and $1B in Orders

Four years after making a very concentrated bet - that transformer architecture would stay dominant long enough to justify hardwiring it into silicon - Etched came out of stealth today with a working chip, $800M in total funding, and over $1 billion in signed customer contracts.

This isn't a paper launch. The company says it achieved first-pass silicon success on TSMC's N4P process, which means the chip worked on the first mask. It's now validating its rack-scale inference systems with customers and plans to start shipping this summer.

Key Facts

DetailValue
Process nodeTSMC N4P
Memory per chip144GB HBM3E
Claimed throughput500K+ tokens/sec (8-chip server, Llama 70B)
Total raised$800M
Signed contracts$1B+
Employees400+
Planned shippingSummer 2026

First-Pass Silicon Is Not a Small Thing

In chip design, "first-pass silicon success" - sometimes called A0 success - means the chip returned from the fab and functioned as intended without requiring a redesign iteration. Most complex chips need at least one or two respins before they work reliably. The fact that Sohu worked on the first attempt on TSMC's N4P process - a production node - suggests that Etched's design methodology is solid.

The Chip That Does One Thing

Sohu is built differently from a GPU. Where Nvidia's H100 and H200 are programmable - you can use them for training, inference, rendering, or scientific compute - Sohu hardwires transformer attention directly into the transistor logic as fixed-function circuitry. There's no general-purpose compute core.

The tradeoff is obvious and the company has never hidden it: everything that isn't a transformer pays a penalty or doesn't run. The upside is that everything that's a transformer runs at a level of efficiency no programmable chip can match.

Etched says the current systems support DeepSeek, Qwen, Mamba, and Llama workloads. The inclusion of Mamba - an SSM architecture, not a standard transformer - suggests the chip has broader architecture support than the original "transformer-only" framing implied. The company hasn't published the technical details explaining how Mamba support was implemented.

The Sohu chip card - Etched's transformer-specific ASIC built on TSMC N4P The Sohu chip card. The chip hardwires transformer attention into silicon as fixed-function logic. Source: etched.com

Full-Stack, Not Just the Die

Etched didn't just design a chip. The company designed the entire server rack: cooling plates, networking, power delivery, and circuit boards. The server shown in their launch materials has a distinctive form factor built around liquid cooling - every component co-designed with the chip's thermal requirements.

This full-stack approach mirrors what Google did with the TPU pod: when you control every layer from the die to the rack, you can optimize the entire power and thermal budget together rather than fitting a general-purpose board to a chip someone else designed.

$800M Raised and $1B in Signed Contracts

The $800M total comes from multiple rounds. The most recent was $500M in December at a $5B post-money valuation, led by Stripes. Prior to that, Jane Street committed more than $100M in a round Etched kept quiet until today. VentureTech Alliance - a venture firm with a strategic partnership with TSMC - also participated, providing a direct line to Etched's manufacturing partner.

Etched's inference server rack, designed from the ground up alongside the Sohu chip Etched's inference server rack. The company designed the full hardware stack - chip, board, cooling, and rack - rather than adapting existing server designs. Source: etched.com

Who Backed It

InvestorTypeStriking Detail
StripesLead, Series B ($500M round)VC firm
Jane StreetTrading firm$100M+ committed
VentureTech AllianceTSMC-linked VCStrategic manufacturing link
Hudson River TradingHigh-frequency trading
Jump TradingHigh-frequency trading
Two SigmaQuant / HFT
Ribbit CapitalFintech VC
Peter ThielIndividualExisting backer
Geoffrey Hinton, Fei-Fei Li, Andrej KarpathyAI researchersIndividual angels
Stanley DruckenmillerHedge fundIndividual angel

The concentration of high-frequency trading firms in this cap table - Jane Street, Hudson River Trading, Jump Trading, Two Sigma - is standout. These firms built their business around low-latency compute and are among the most sophisticated hardware buyers in the market. When firms like that write checks this size into an inference chip startup, it suggests they see a use case they aren't advertising publicly.

$1B in Signed Contracts

The company is being careful with language: these are signed customer contracts, not letters of intent or pipeline. CEO Gavin Uberti said the company saw "frontier AI would become one of the most economically significant technologies ever created, but the needed infrastructure simply did not exist" - and found enough customers who agreed to sign before the chip shipped.

Ramping production to fulfill $1B in contracts requires significant manufacturing capacity. Etched has a Taiwan factory and a data center and prototyping lab in San Jose. The company is targeting gigawatt-scale operations by 2027.

How Sohu Stacks Up on Paper

These are vendor-supplied numbers unless otherwise noted. No independent organization has benchmarked Sohu at production scale.

MetricSohu (8-chip server)H100 (8-chip server)H200 (8-chip server)
Tokens/sec, Llama 70B (batch 1)500,000+~23,000~35,000
Memory per chip144GB HBM3E80GB HBM3141GB HBM3E
Process nodeTSMC N4PTSMC 4NTSMC 4N
ArchitectureTransformer ASICGeneral-purpose GPUGeneral-purpose GPU
Software stackProprietary compilerCUDA + vLLMCUDA + vLLM
Independent benchmarksNone publishedExtensiveExtensive
AvailabilitySummer 2026NowNow

The 20x throughput claim is real on the narrow benchmark it was measured on. A single Sohu server claims to replace roughly 160 H100 GPUs for Llama 70B inference at batch size 1. Those are exceptional numbers for a latency-bound workload.

Etched is entering a field that already has serious challengers. Groq's LPU built a commercially available inference product around similar throughput-focused logic. Cerebras took a different architectural path and has been shipping to enterprise and government customers. Neither has displaced Nvidia. The question for Etched is whether its numbers justify the engineering cost of migration.

The interior of Etched's server, showing the liquid cooling and cable management co-designed with the Sohu chip Inside the Etched server: water cooling lines and copper heat exchangers co-designed with the Sohu chip's thermal profile. Source: etched.com

Where It Falls Short

Vendor Benchmarks Only

Every throughput number in Etched's announcement comes from Etched's own controlled demonstrations. The 500K tokens/sec figure is measured at batch size 1 - a single concurrent request. This is the best possible scenario for a chip optimized for sequential token generation.

At batch size 256 - closer to what a real inference API handles under load - a single H100 delivers around 45,000 tokens per second. Etched hasn't published comparable batch-256 numbers. The gap between batch-1 and batch-256 performance is where vendor claims and production reality tend to diverge.

The Proprietary Stack

Rolling out on Sohu means abandoning vLLM, TensorRT-LLM, and SGLang - the three most widely used open-source inference frameworks. Etched supplies its own compiler and software stack. For an engineering team running production inference today, adopting an unproven toolchain from a startup that has not yet shipped at scale is a meaningful risk, regardless of the performance ceiling.

The company has over 400 employees, with a significant portion recruited from Nvidia, Broadcom, Google TPU, and SK Hynix. The talent base is credible. But a proprietary software stack needs more than good engineers - it needs the ecosystem adoption that makes debugging tractable at 3am when a model is throwing errors.

The Architectural Bet Still Has Tail Risk

Etched's founders have acknowledged the risk clearly: if transformer architecture becomes obsolete, the company's chips become obsolete with it. The rise of MoE models like DeepSeek V4 and hybrid SSM-transformer architectures are already pushing the definition of "transformer" well beyond the original formulation. The Mamba support Etched now claims was not in the original product pitch.

"If you have compute now, people will buy it."

Patrick O'Shaughnessy, CEO of Positive Sum (Etched investor), captured the thesis in one sentence. The bet is that inference demand is so large and so urgent that a chip delivering 20x throughput on a specific workload will find customers before a more flexible architecture catches up on that workload.

Summer 2026 is when that thesis meets production traffic. The $1B in contracts suggests the customers are already waiting.


Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.