Nebius Buys Eigen AI for $643M to Own Inference

Nebius agrees to acquire 20-person MIT inference startup Eigen AI for $643M, betting that optimizing every token per Nvidia chip is the real moat in the AI infrastructure race.

Nebius Buys Eigen AI for $643M to Own Inference

Nebius Group announced Thursday it'll pay $643 million to acquire Eigen AI, a 20-person California startup founded by three MIT researchers who made some of the most widely deployed inference techniques in modern AI. The deal - structured as $98 million in cash plus 3.8 million Nebius Class A shares - is the company's second acquisition in three months and its clearest signal yet that the profitable center of gravity in AI infrastructure has shifted from training clusters to production inference.

TL;DR

  • $643M deal: $98M cash + 3.8M Nebius (NBIS) shares, expected to close in weeks pending antitrust
  • Eigen AI's 20-person team from MIT HAN Lab holds the foundational patents on AWQ quantization and Sparse Attention - techniques now running in most production LLM deployments
  • Peak throughput: 911 tokens/second on leading open-source models including DeepSeek, Llama 4, and Qwen
  • Eigen AI was named #1 speed inference provider on Jensen Huang's NVIDIA GTC 2026 keynote slide
  • NBIS stock up 11.76% on the news; Nebius has flagged further acquisitions over the next 18 months

The Target: What Eigen AI Built

Eigen AI was founded in 2025 under a name its founders chose deliberately: Artificial Efficient Intelligence. The premise was that raw model scale had already been solved by the frontier labs, and the next decade of value would flow to whoever could make those models run cheapest and fastest.

The MIT HAN Lab Pedigree

All three co-founders came from MIT's HAN Lab, the research group led by Professor Song Han that has been the source of most of the practical efficiency work underpinning modern production AI.

Ryan Hanrui Wang's 2020 paper on Sparse Attention - which dramatically reduced the compute required for long-context inference - remains the most-cited paper published at the IEEE High Performance Computer Architecture conference in the past five years. Wei-Chen Wang developed Activation-aware Weight Quantization, known universally in deployment circles as AWQ. AWQ is now the default approach for 4-bit model serving in production; if you are running an open-source model at scale today, you're almost certainly using Wei-Chen's technique. He received the MLSys 2024 Best Paper Award for the work.

The third co-founder, Di Jin, holds a PhD from MIT CSAIL and contributed directly to post-training for Meta's Llama 4 family. He also co-authored the CGPO reinforcement learning from human feedback framework, giving Eigen AI unusually deep expertise not just in inference speed but in how models respond to post-training adjustments.

The Optimization Stack

Eigen AI's commercial product stacks these techniques end-to-end: post-training quantization, KV-cache optimization, custom CUDA kernels, and routing across available compute. The company published benchmarks showing 911 tokens per second on models including DeepSeek, Llama, and Qwen - a figure that drew enough attention to land the company on Jensen Huang's keynote slide at NVIDIA GTC 2026 as the #1 speed inference provider.

That validation matters in the enterprise sales cycle. When Nvidia's CEO is using your benchmark numbers in front of 10,000 engineers, the conversation with a CTO who wants to know why your inference layer is worth paying for becomes considerably shorter.

A dense server rack with illuminated status indicators in a modern data center Modern AI inference workloads demand highly optimized GPU use - exactly what Eigen AI's stack was built to deliver. Source: unsplash.com

The Deal at a Glance

MetricDetail
AcquirerNebius Group (NASDAQ: NBIS)
TargetEigen AI
Deal value~$643M ($98M cash + 3.8M NBIS shares)
Team size20 people
Founded2025, MIT HAN Lab alumni
Peak throughput911 tokens/sec (Llama, DeepSeek, Qwen)
GTC 2026 recognition#1 speed inference provider, Jensen Huang keynote
Nebius stock reaction+11.76% on announcement
Expected closeWeeks (antitrust pending)
Comparable recent dealNebius/Tavily - February 2026

For context, Nebius raised its 2026 revenue guidance to $3-3.4 billion earlier this year - a figure built on Meta's $27 billion five-year cloud contract and a $2 billion strategic investment from Nvidia. The Eigen AI acquisition is not a financial needle-mover at this scale. It's a technical and talent play.

Who Benefits

Nebius Token Factory gets a defensible moat. The company's managed inference product - which competes directly against Fireworks, Baseten, and CoreWeave's inference layer - has until now relied on the same underlying optimization libraries available to every other provider. Bringing Eigen AI in-house means Nebius can offer throughput and cost-per-token that third parties simply cannot reproduce without hiring the same researchers.

Roman Chernin, Nebius co-founder and Chief Business Officer, made the commercial logic explicit: "Eigen's technology maximizes the number of tokens generated by each Nvidia chip Nebius uses for inference." In a market where every inference provider is paying Nvidia market rates for H100 and H200 capacity, whoever extracts the most output per chip wins on price. Eigen AI gives Nebius that edge - and given that inference is projected to account for two-thirds of total AI compute demand in 2026, the timing is correct.

Eigen AI's founders get resources and reach. A 20-person team with foundational research but no distribution now has access to Nebius's global compute footprint and the enterprise relationships that come with a $27 billion customer. The team will establish Nebius's Bay Area engineering and research presence in San Francisco, which also gives the parent company a West Coast anchor for US talent recruitment.

Open-source model users benefit downstream. Eigen AI's optimization work is focused on Llama, DeepSeek V4, Qwen, and other openly available models. As these techniques get deeper integration into Token Factory's production pipeline, the users of those models - whether through Nebius directly or via providers that benchmark against it - will see faster, cheaper inference.

Who Pays

A close-up of NVIDIA GPU processors used in large-scale AI compute deployments AI inference workloads are overwhelmingly GPU-bound; Eigen AI's techniques squeeze more tokens per chip from Nvidia hardware. Source: unsplash.com

Nebius shareholders absorb dilution. The 3.8 million new Class A shares issued represent real dilution at a time when the stock is trading near its 52-week high around $155. The 30-day weighted average price used to value the share component puts the stock consideration at roughly $545 million - so shareholders are funding the majority of a $643 million deal through equity, not the company's cash reserves.

Inference commodity providers face a harder fight. Fireworks and Baseten have both built businesses on open-source inference optimization. Neither has bought the researchers who invented the core techniques. Nebius just did. The companies will now need to either recruit comparable talent - expensive and slow - or accept that their primary competitive differentiator, throughput per dollar, has a new ceiling set by a better-funded rival.

The broader inference market faces faster commoditization. This acquisition accelerates what was already a race to the bottom on per-token pricing. When the most efficient optimization stack is owned by one provider and integrated at the infrastructure level, every other player either improves their own stack or competes on price. For enterprise buyers, that is good news. For inference startups without a research moat, it's a warning.

"We are operating in a capacity-scarcity world where AI builders need optimised inference and infrastructure scale." - Roman Chernin, Nebius CBO


The acquisition is Nebius's second in three months - the company bought AI search startup Tavily in February - and Chernin has indicated the company is actively assessing additional targets over the next 18 months. The inference layer is being consolidated by firms with both the compute and the capital to absorb the research talent building it, and Nebius is moving faster than most.

Buying the people who invented AWQ and Sparse Attention isn't a bet on a product - it's a bet that the next competitive moat in AI infrastructure is knowing how to use hardware better than everyone else, and that those three researchers are the fastest path to that position.

Sources:

Daniel Okafor
About the author AI Industry & Policy Reporter

Daniel is a tech reporter who covers the business side of artificial intelligence - funding rounds, corporate strategy, regulatory battles, and the power dynamics between the labs racing to build frontier models.