Networking | Awesome Agents

Ayar Labs Raises $500M to Wire AI Chips With Light

Daniel Okafor — Wed, 04 Mar 2026 17:10:44 +0100

Ayar Labs has closed a $500 million Series E funding round, valuing the startup at $3.75 billion and pushing its total outside funding to $870 million. The round was led by Neuberger Berman, with Nvidia and AMD participating as strategic investors with MediaTek, Qatar Investment Authority, Alchip Technologies, ARK Invest, Insight Partners, and Sequoia Capital.

The company builds co-packaged optics - silicon photonic chips that replace copper interconnects inside AI server clusters with light-based links. That sounds like a narrow engineering problem. The investors apparently think it's the next major bottleneck in artificial intelligence infrastructure.

TL;DR

Ayar Labs closes $500M Series E at a $3.75 billion valuation
Led by Neuberger Berman; Nvidia, AMD, Sequoia, and QIA among investors
Total funding now $870M across all rounds
Technology replaces copper chip-to-chip links with optical fiber
Claims 4x to 20x more throughput per watt vs copper
Capital goes toward volume production and a new Taiwan office

Why Copper Is Losing the Race

The problem Ayar is selling against is real and getting worse. As AI training and inference clusters scale, the electrical signals traveling through copper traces between chips degrade. Higher currents increase signal noise, energy losses mount, and the distances that data can travel without degradation shrink. This isn't a software problem. It is physics.

Ayar's TeraPHY chiplets transmit data as light rather than electrons. The company's next-generation design, with eight chiplets per package, supports more than 200 terabits per second of aggregate bandwidth. For context, Nvidia's Rubin GPU architecture supports 28.8 terabits per second per package on its copper interconnects. The optical figure is roughly seven times higher.

The energy math is similarly stark. Ayar claims its optical interconnects deliver between four and twenty times more compute throughput per watt compared to conventional copper connections. In a world where data centers are struggling to secure power and cooling capacity, that efficiency gap matters.

Nvidia itself has been moving money into photonics. As we covered when Nvidia committed $4 billion to photonics partnerships with Lumentum and Coherent, the company has been building a position in optical interconnect technology for over a year. Backing Ayar directly - as a strategic investor in this round - is the next step in that same thesis.

The TeraPHY Architecture

Ayar's system uses two components. The SuperNova chip produces the laser light source. The TeraPHY chiplet encodes data onto that light and can process up to eight terabits of traffic per second in its current generation. The chiplets use the UCIe standard, which allows them to integrate directly with GPUs and other processors as co-packaged components rather than external modules.

Ayar Labs' TeraPHY 8 Tbps UCIe optical I/O chiplet, which co-packages directly with GPUs and other processors. Photo: Ayar Labs.

The company has also built reference designs with Alchip and Global Unichip Corp, two of the largest chip design service firms in Taiwan, which explains the new Hsinchu office. Volume production requires proximity to the advanced packaging ecosystem.

Who Benefits

Hyperscalers and cloud providers are the most direct beneficiaries. For companies running tens of thousands of GPUs in tightly coupled training clusters, the latency and bandwidth ceiling imposed by copper links is a genuine constraint on model scale. Meta's multibillion-dollar GPU buildout with Nvidia is exactly the kind of deployment where interconnect bandwidth becomes a first-order constraint. Optical interconnects remove that ceiling, at least for chip-to-chip communication within a rack or between adjacent racks.

Nvidia and AMD benefit in a different way. By backing Ayar as a strategic investor, both companies make sure co-packaged optics technology is available and compatible with their own GPU architectures before competitors lock in an alternative standard. Pat Gelsinger, the former Intel CEO who sits on Ayar's board, knows from experience what happens when a company cedes the interconnect layer to a competitor.

Ayar's existing investors - Sequoia, ARK Invest, Insight Partners - benefit from the signal sent by having the two dominant GPU makers participate in the same round. That's not a typical outcome for a hardware startup.

Competitive Landscape

Company	Approach	Backing	Status
Ayar Labs	Co-packaged optics (CPO)	Nvidia, AMD, Sequoia	Volume production ramp
Intel	Optical Compute Interconnect (OCI)	In-house	Demo stage
Lumentum	Pluggable optics + photonics	Nvidia investment	Production
Coherent	Silicon photonics transceivers	Nvidia investment	Production

Ayar's co-packaged approach is more tightly integrated than pluggable optics solutions, which improves latency and power efficiency but also makes it harder to swap out. That's a larger bet on a single architecture.

Who Pays

Ayar Labs carries the execution risk. The company has spent fifteen years developing its core technology, according to CEO Mark Wade. The challenge now is manufacturing at scale. Co-packaged optics require advanced packaging processes - the TeraPHY chiplets need to be integrated directly with processors during chip production, not added later. That demands tight coordination with foundries and packaging partners.

Customers will pay a premium over copper for the first generation of products. The unit economics of photonics at scale are still being proven. The 4x to 20x efficiency claim is a wide range, which suggests real-world performance depends heavily on workload type and deployment configuration.

Dense GPU clusters like these are where copper interconnects hit their limits - and where optical fiber now promises to take over.

The broader AI infrastructure ecosystem absorbs the transition cost if co-packaged optics becomes the dominant standard. Existing server designs, rack configurations, and supply chains are built around copper. Replacing them is an industry-wide capital event, not just a product decision for one vendor.

The rational question is whether Ayar has timed this correctly. Co-packaged optics has been described as the next interconnect revolution for the better part of a decade. What has changed is that AI training clusters have grown large enough that copper's physical limits are now a visible constraint rather than a theoretical one. With Nvidia and AMD both in the cap table and volume production imminent, the technology is closer to deployment than the hype cycle suggests - though whether Ayar captures the value or becomes infrastructure for someone else's margin is still an open question.

Sources:

Ayar Labs Closes $500M Series E, Accelerates Volume Production of Co-Packaged Optics - TechFundingNews
Co-packaged optics startup Ayar Labs raises $500M round backed by Nvidia, AMD - SiliconANGLE
Ayar Labs raises $500M to mass-produce CPO chiplets - The Register
Nvidia-backed Ayar Labs raises $500M to speed AI chips with light-based interconnects - TechStartups

Nvidia Pours $4B Into Photonics for AI Data Centers

Elena Marchetti — Tue, 03 Mar 2026 12:18:44 +0100

Nvidia just committed $4 billion to a problem most people outside the data center world have never heard of: the wires connecting its GPUs are running out of bandwidth.

On March 2, the company announced $2 billion investments in each of two optical component manufacturers - Lumentum Holdings and Coherent Corp - to develop silicon photonics technology that replaces copper interconnects with light-based communication inside AI data centers. The deals include multibillion-dollar purchase commitments and future capacity access rights, and both companies will use the funding to expand U.S.-based manufacturing.

TL;DR

Nvidia is investing $4 billion total - $2B in Lumentum and $2B in Coherent - for silicon photonics R&D and manufacturing
The deals fund new U.S. fabrication facilities and secure long-term supply of advanced laser and optical components
Co-packaged optics can reduce data center networking power consumption by up to 3.5x compared to traditional copper-based pluggable transceivers
Both partnerships are multiyear and nonexclusive, extending relationships that span over 20 years

Why Copper Hit a Wall

The Physics Problem

As AI training clusters scale to hundreds of thousands of GPUs, the data flowing between them has become the real bottleneck. Copper cables worked fine at lower speeds, but at 224 Gbps per lane - the rate needed for current-generation AI workloads - passive copper reaches shrink to less than one meter. That isn't a misprint. At the bandwidths AI factories demand, copper physically can't carry signals more than an arm's length.

A single AI factory can use up to 2.4 million optical transceivers and consume up to 24 megawatts of networking power alone - potentially over 10% of the total data center energy budget. As Nvidia reported record-breaking $68.1 billion in quarterly revenue fueled by AI demand, the need to solve the interconnect bottleneck has become urgent.

The Photonics Solution

Silicon photonics replaces electrical signals with laser-based data transmission integrated directly into processor packages. Rather than routing data through copper traces and bulky external transceivers, co-packaged optics (CPO) place the optical engines right on the switch ASIC. Nvidia's own benchmarks claim this approach delivers 3.5x better power efficiency, 10x higher network resiliency, and 63x greater signal integrity compared to traditional pluggable modules.

"In the age of AI, software runs on intelligence with tokens generated in real time by AI factories for every interaction and every context," said Jensen Huang, NVIDIA CEO. "Together with Lumentum, NVIDIA is advancing the world's most sophisticated silicon photonics to build the next generation of gigawatt-scale AI factories."

Fiber optic technology replaces copper's electrical signals with light, enabling dramatically higher bandwidth over longer distances inside AI factories.

The Two Deals

Lumentum - The Laser Specialist

Nvidia's $2 billion investment in Lumentum targets advanced laser components and optical subsystems. The multiyear, nonexclusive agreement includes a multibillion-dollar purchase commitment and future capacity access rights. Lumentum will use the funding to build a new U.S.-based fabrication facility.

"This multiyear strategic agreement reflects our shared commitment to advancing the optics technologies that will power the next generation of AI infrastructure," said Michael Hurlston, Lumentum CEO.

Coherent - The Networking Backbone

The matching $2 billion in Coherent extends a partnership that has existed for over 20 years. Like the Lumentum deal, it includes a multibillion-dollar purchase commitment for advanced laser and optical networking products, with Coherent also expanding U.S. manufacturing.

"This strategic relationship underscores Coherent's role as a key enabler of next-generation AI data center infrastructure," said Jim Anderson, Coherent CEO.

	Lumentum	Coherent
Investment	$2 billion	$2 billion
Focus	Laser components, optical subsystems	Optical networking, silicon photonics
Relationship	Multiyear	20+ year extension
U.S. Manufacturing	New fabrication facility	Expanded existing capacity
Exclusivity	Nonexclusive	Nonexclusive

What It Does Not Tell You

The nonexclusive nature of both deals is worth flagging. Nvidia is securing supply and accelerating R&D, but it isn't locking these companies into exclusive arrangements. Lumentum and Coherent remain free to sell to AMD, Intel, or anyone else building competing AI infrastructure. That means Nvidia is betting the technology itself will become critical - and positioning to be first in line when it does.

There's also the question of timeline. Nvidia's Spectrum-X Photonics switches with co-packaged optics are slated for the second half of 2026, and Quantum-X InfiniBand variants for early 2026. But launching CPO at scale in production data centers is a different challenge from shipping product. The infrastructure buildout these investments fund - new fabs, expanded capacity - will take years to reach full output.

The $4 billion total also pales against Nvidia's own fiscal 2026 full-year revenue of $215.9 billion. This is a strategic positioning play, not a bet-the-company move. Nvidia is spending roughly two weeks of revenue to secure a supply chain it believes will define the next decade of AI infrastructure. For comparison, Nvidia's new inference chip partnership with Groq for OpenAI addresses the compute side of the same scaling equation - this deal addresses the pipes.

A single AI factory can consume 24 megawatts of networking power alone - photonics aims to cut that figure in half.

The Bigger Picture

Over 80% of hyperscale data center links already use some form of optical solution. What Nvidia is pushing for goes further: integrating optics directly into the processor package, removing the transceiver as a separate component entirely. If the industry adopts co-packaged optics at scale, the power savings alone would be enormous - Nvidia claims up to 50% reduction in total networking energy consumption.

For anyone following the AI infrastructure build cycle - from the CUDA programming stack through to the DGX Spark hardware - this investment signals where Nvidia sees the next constraint. The company has spent the past three years leading compute. Now it is buying its way into owning the interconnects too.

Nvidia is not spending $4 billion because photonics is trendy. It's spending $4 billion because at the scale AI factories are heading, copper simply can't keep up. Whether the company can translate supply chain investments into an actual competitive moat - or whether AMD and others will ride the same photonics wave - will play out over the next two to three years. For now, Nvidia is doing what it does best: moving first and spending aggressively to make sure the next infrastructure era runs on its terms.

Sources:

Mac Studio Clusters Now Run Trillion-Parameter Models for $40K

Sophie Zhang — Sun, 01 Mar 2026 11:00:00 +0100

Four Mac Studios. 1.5 terabytes of unified memory. One trillion-parameter model running at 25 tokens per second. Total cost: about $40,000.

That is the setup Creative Strategies documented this month, running Kimi K2 Thinking - a 1 trillion parameter model - on a cluster of Mac Studios connected via Thunderbolt 5. Jeff Geerling's benchmarks confirmed similar numbers: 32 tokens per second on Qwen3 235B across the same four-node setup.

TL;DR

Four Mac Studios with 512GB or 256GB each create a 1.5TB unified memory cluster for ~$40,000
macOS Tahoe 26.2 enabled RDMA over Thunderbolt 5, dropping inter-node latency from 300 microseconds to under 50 microseconds
The cluster runs Kimi K2 (1T parameters) at ~25 tok/s and Qwen3 235B at ~32 tok/s
Equivalent NVIDIA setup would require 26+ H100 GPUs at $780,000+ plus networking and datacenter infrastructure
The total system draws 450-600W - less than a single H200
Apple Insider confirms macOS RDMA works on M4 Pro Mac Mini, M4 Max Mac Studio, and M3 Ultra Mac Studio

This Is Not the OpenClaw Mac Mini Story

Let me be clear about what this is and what it is not.

Last month we covered how people were buying $2,200 Mac Minis to run OpenClaw - an agent framework that makes API calls to cloud providers. The Mac's GPU sat idle. That was a $2,200 API client and a waste of good hardware.

This is the opposite story. These Mac Studio clusters are doing the actual inference locally. The GPU is not idle - it is running a trillion-parameter model entirely on-device, with no API calls, no cloud dependency, no per-token costs, and no data leaving the premises.

The difference between those two stories is the difference between a misunderstanding and a genuine infrastructure shift.

The Technical Breakthrough - RDMA Over Thunderbolt 5

The enabling technology is deceptively simple. In macOS Tahoe 26.2, Apple quietly added RDMA (Remote Direct Memory Access) support over Thunderbolt 5. RDMA allows one machine to directly read and write to another machine's memory without involving the CPU or operating system kernel on either side.

Before RDMA, connecting multiple Macs for distributed inference used standard networking protocols. Each memory transfer went through the full network stack: application to kernel to NIC to wire to NIC to kernel to application. Round-trip latency: approximately 300 microseconds per transfer.

With RDMA, the transfer bypasses the entire stack. One Mac's GPU writes directly to another Mac's memory region. EXO Labs, the open-source clustering software that powers most of these setups, measured the improvement at 300 microseconds down to 3 microseconds - a 100x reduction.

Jeff Geerling's measurements showed slightly higher real-world latency at under 50 microseconds end-to-end, which is still a 6x improvement over the pre-RDMA baseline. Either way, the latency is now low enough that distributed inference across four Macs feels like a single machine to the model.

The Math - $40K vs $780K

Implicator.ai ran the cost comparison and the numbers are striking:

Configuration	Cost	Memory	Power Draw
4x Mac Studio (512GB each)	~$47,000	2TB unified	450-600W
4x Mac Studio (mixed 512/256GB)	~$40,000	1.5TB unified	450-600W
26x NVIDIA H100 80GB (equivalent memory)	~$780,000+	2.08TB HBM3	~18,200W
Cloud rental (26x H100, 1 year)	~$456,000/yr	-	-

The Mac cluster costs 5% of the NVIDIA hardware price and draws 3% of the power. The trade-off is throughput: 26 H100s would deliver dramatically higher tokens per second for batch inference. But for single-user or small-team interactive use - a developer querying a local model, a startup iterating on prompts, a law firm running private document analysis - 25-32 tokens per second is responsive enough for real work.

The power comparison is particularly notable. A single NVIDIA H200 draws 700W under load. The entire four-Mac-Studio cluster draws 450-600W total. No liquid cooling required. No datacenter. No special electrical work. A standard 15-amp wall outlet handles it.

What People Are Actually Running

Based on the benchmarks published by Creative Strategies and Jeff Geerling, here is what a four-node Mac Studio cluster can do:

Model	Parameters	Quantization	Tokens/sec	Memory Used
Kimi K2 Thinking	1T (MoE)	Q4	~25 tok/s	~800GB
Qwen3 235B	235B	Q4_K_M	~32 tok/s	~140GB
Llama 3.1 405B	405B	Q4_K_M	~18-22 tok/s	~230GB
DeepSeek V3	671B	Q4_K_M	~15-20 tok/s (est.)	~380GB

For context, Llama 3.1 405B in Q4_K_M requires approximately 230GB of memory. That exceeds the capacity of any single GPU on the market - even the GB300 NVL72's 288GB per GPU. On a single Apple M4 Max with 128GB, you can run 405B at aggressive quantization (Q2_K) but with significant quality loss. The four-Mac cluster fits it comfortably at Q4_K_M with memory to spare for KV cache.

The sweet spot appears to be models in the 200B-400B parameter range at Q4 quantization. These models are meaningfully more capable than the 7B-70B models that fit on a single consumer GPU, and the Mac cluster makes them accessible without datacenter infrastructure.

Who Is Building These

The buyer profile is specific and distinct from the Mac Mini OpenClaw crowd:

Enterprise compliance teams. CXOToday reports that healthcare, fintech, and legal tech companies are evaluating Mac clusters for scenarios where data cannot leave the premises. GDPR, HIPAA, and financial regulations create genuine requirements for on-premises inference that cloud providers cannot satisfy with contract clauses alone. Jigsaw24, a UK enterprise Apple reseller, has published deployment guides for private LLM setups using EXO Labs.

AI researchers and hobbyists with serious budgets. The Hacker News thread on Mac Studio for local AI shows users running 256GB-512GB configurations. The primary motivation cited: privacy and control, not cost savings. These are developers who want to iterate on large models without per-token API costs or rate limits, and who have $10,000-$50,000 to spend on a permanent inference rig.

Startups avoiding cloud lock-in. At the break-even analysis published by Prem.ai, a team spending $47,000/month on cloud inference cut their compute costs by 83% to $8,000/month using a hybrid local-cloud approach. The Mac cluster is the local half of that equation for teams that do not want to operate NVIDIA GPU servers.

The Shortage - Real but Complicated

Mac Studio delivery times have stretched to 1-2 months for high-memory configurations. 9to5Mac confirmed shipping estimates pushing into April 2026, particularly for 512GB RAM units. Apple Insider notes the difficulty separating AI-driven demand from normal product-cycle effects - Apple is widely expected to refresh the Mac Studio with M5 Ultra later this year, and inventory drawdowns before a refresh are normal.

In Europe, the situation is more acute. Czech tech publication Letem svetem Applem reported the highest-configured Mac Studio completely sold out, with weeks-long waits. At approximately 17,000 EUR for a fully loaded unit, these are not impulse purchases.

The Limitations

The Mac cluster story is real, but it comes with important caveats.

Inference only. You cannot train models on Apple Silicon. There is no equivalent to NVIDIA's CUDA training ecosystem for Metal. If you need to fine-tune or train, you still need NVIDIA GPUs or cloud compute. The Mac cluster is strictly for running pre-trained models.

Throughput, not batch performance. 25-32 tokens per second is great for interactive single-user inference. It is not competitive with even a single H100 for batched production serving, where throughput is measured in thousands of tokens per second across concurrent requests.

Software ecosystem is young. EXO Labs is the primary clustering tool and it is open source with a small team. RDMA support was added in December 2025. The stack works, but it is not enterprise-grade in the way that NVIDIA's inference stack (TensorRT-LLM, Triton Inference Server, NIM) has been battle-tested for years.

Memory bandwidth is the bottleneck. Apple's unified memory delivers 546 GB/s on the M4 Max and 819 GB/s on the M4 Ultra. Compare that to 3,350 GB/s on an H100 or 8,000 GB/s on a B200. The Mac cluster compensates for lower bandwidth with more total memory capacity, but per-token latency will always be higher than dedicated datacenter GPUs.

The Bottom Line

The Mac Studio cluster is the first sub-$50,000 setup that can run trillion-parameter models locally with usable performance. That is a genuine milestone for privacy-sensitive workloads and developers who want to experiment with frontier-scale models without cloud dependency.

It is not a replacement for datacenter GPUs - the throughput gap is too large for production batch serving. But for the specific use case of interactive, private, on-premises inference with very large models, nothing else in this price range comes close.

If your threat model requires data to stay on-premises, or if you are spending more than $3,000/month on cloud inference APIs and can tolerate lower throughput, the math works. Four Mac Studios pay for themselves in under a year compared to cloud H100 rental.

For everyone else - particularly anyone considering this for workloads that fit on a single RTX 4090 or RTX 5090 - the NVIDIA consumer GPU path remains faster, cheaper, and better supported.

Sources: