DeepMind's AlphaEvolve Recovered 0.7% of Google's Compute

When Google DeepMind released AlphaEvolve earlier this year, the public-facing results were impressive in a controlled-experiment way: novel game theory algorithms, better matrix multiplication, new mathematical lower bounds. Yesterday's follow-up report is a different kind of document. AlphaEvolve isn't just producing paper results. It's running in production across Google's data centers, TPU design pipeline, and a growing list of commercial partnerships - and the numbers it's accumulating are hard to ignore.

TL;DR

AlphaEvolve recovered 0.7% of Google's worldwide compute through improved data center task scheduling
A circuit design AlphaEvolve proposed was deemed so efficient it went directly into next-gen TPU silicon
Commercial partners include Klarna (2x training speed), FM Logistic (15,000+ km saved/year), and Schrödinger (4x MLFF speedup)
Grid optimization for AC Optimal Power Flow problems moved from 14% to 88% feasibility
Access is still through Google Cloud's partner program - there's no public API

How AlphaEvolve Works

AlphaEvolve isn't a language model you call through an API. It's a system built on top of Gemini 3.1 Pro and Gemini Flash that runs an evolutionary loop over code. The architecture is straightforward in concept but demanding to deploy:

1. Prompt sampler: assembles task description + program history for the LLM ensemble
2. LLM ensemble:
     Gemini Flash → high-throughput candidate generation (breadth)
     Gemini Pro   → higher-quality suggestions (depth, novel directions)
3. Evaluator: automated metric scores each candidate (throughput, error rate, etc.)
4. Program database: evolutionary selection - high-scoring variants seed the next round
5. Loop: repeat until convergence or compute budget exhausted

The key constraint is step 3. Someone has to write the scoring function before AlphaEvolve can run. Without a programmatic way to assess candidates, the evolutionary loop has nothing to optimize against.

The Dual-Model Setup

The Flash-Pro split is practical engineering. Flash produces a high volume of candidate mutations at lower latency; Pro produces higher-quality suggestions that move the search in new directions. Most gains come from a small fraction of Pro's candidates. Flash provides coverage; Pro provides breakthroughs.

The Evolutionary Loop

The loop extends FunSearch (an earlier DeepMind system) to handle longer programs and a broader class of problems. Instead of evolving single functions, AlphaEvolve can evolve entire algorithm implementations - scheduling policies, kernel code, circuit designs in Verilog. The program database acts as the population: survivors of each generation seed the next.

Server racks in a modern data center with blue LED lighting AlphaEvolve's data center scheduling optimization is running in production, recovering 0.7% of Google's global compute - a significant number at Google's scale. Source: unsplash.com

Infrastructure Impact

The clearest signal that AlphaEvolve works is that Google rolled out its results. That's a different bar from a benchmark result on a test set.

Data Center Scheduling

AlphaEvolve found a better policy for scheduling tasks across Google's data centers. The report describes it as "recovering on average 0.7% of global compute resources." The policy runs continuously, not as a one-time optimization pass.

TPU Silicon Design

Jeff Dean described the chip design result directly: "AlphaEvolve proposed a circuit design so counterintuitive yet efficient it was integrated into next-generation TPU silicon." This matters because chip design has long verification cycles. A result from an LLM-based system getting into actual silicon means it survived not just automated verification but expert human review.

The system also improved existing GPU kernel implementations. FlashAttention - a key component in transformer inference and training - saw up to 32.5% speedup through AlphaEvolve-discovered implementations. The matrix multiplication kernel used in Gemini's own training improved by 23%, cutting training time by 1%.

Google Spanner and Compilers

Less flashy but concrete: a 20% reduction in Google Spanner write amplification and a 9% reduction in software storage footprint through compiler optimizations. These aren't tuned for a benchmark. They're launched in Spanner, which is Google's globally distributed relational database.

Scientific Breakthroughs

$Mathematical equations and formulas on a blackboard$ Across 50 open mathematical problems, AlphaEvolve rediscovered state-of-the-art solutions in 75% of cases and found improved solutions in 20%. Source: unsplash.com

Mathematics with Terence Tao

Mathematician Terence Tao worked with AlphaEvolve on Erdős problems - open combinatorics questions that have resisted human-designed solutions for decades. Tao said in the report: "Tools such as AlphaEvolve give mathematicians useful new capabilities...greatly improves our intuition." The system also improved lower bounds for the Traveling Salesman Problem and Ramsey numbers.

The baseline for mathematical benchmarks: across 50 open problems tested, AlphaEvolve rediscovered state-of-the-art solutions in 75% of cases and found truly improved solutions in 20%.

Quantum Computing

AlphaEvolve optimized quantum circuits for Google's Willow processor, achieving 10x lower error rates for molecular simulations compared to conventionally optimized baselines. Quantum circuit optimization is a good match for evolutionary search because small improvements in gate sequences compound significantly as circuit depth increases.

Genomics and Grid Optimization

AlphaEvolve improved Google's DeepConsensus DNA sequencing model, reducing variant detection errors by 30%, in a partnership with PacBio.

DNA sequencing laboratory equipment showing genetic analysis AlphaEvolve cut DNA variant detection errors by 30% in partnership with PacBio, improving the DeepConsensus sequencing model. Source: unsplash.com

On electrical grid planning, AlphaEvolve moved feasibility for AC Optimal Power Flow problems from 14% to 88% - a hard combinatorial optimization problem with direct implications for how renewable energy sources can be integrated into grid infrastructure. It also improved natural disaster prediction accuracy by 5% across 20 event categories.

Commercial Partners

Company	Domain	Result
Klarna	Fintech/ML	2x transformer model training speed
Substrate	Semiconductors	Multi-fold speedup in lithography simulations
FM Logistic	Logistics	10.4% routing efficiency gain, 15,000+ km saved annually
WPP	Marketing	10% improvement in campaign optimization accuracy
Schrödinger	Materials/Biotech	4x speedup in Machine Learned Force Fields

The range of domains is worth noting. FM Logistic is a large European logistics operator - the 15,000+ km saved per year is a real operational number, not a benchmark projection. Schrödinger's MLFF speedup is directly relevant to drug discovery pipelines where simulation time is a bottleneck. Klarna doubling transformer training speed at a fintech company is a long way from mathematical research.

Problem Compatibility

AlphaEvolve isn't a general-purpose optimizer. It works on a specific class of problems.

Requirement	Status	Notes
Automated evaluation metric	Required	Without a programmatic scoring function, the loop can't run
Deterministic or near-deterministic evaluation	Strongly preferred	Noisy metrics slow convergence and mislead selection
Problem expressible as code	Required	The algorithm must be representable as executable logic
Physical lab experiments	Not compatible	AlphaEvolve can't wait for wet-lab results to score candidates
Human judgment as primary quality signal	Not compatible	Manual review as the scoring loop is too slow for the evolutionary cycle

Where It Falls Short

The report is a curated collection of successes. The selection bias is obvious but worth naming.

AlphaEvolve's results depend completely on the quality of the evaluation function. Writing a good automated scorer for a novel problem is itself a research task requiring domain expertise and iteration. The system can't tell you whether your metric is measuring the right thing - it'll optimize whatever you give it.

The compute budget is also non-trivial. Running AlphaEvolve on a meaningful problem requires substantial GPU time for the evolutionary loop plus Gemini API costs for the LLM ensemble. The infrastructure optimizations at Google are worth their compute cost because 0.7% of Google-scale infrastructure is enormous. For smaller organizations, the math won't work out for every problem class.

Access is still internal or through Google Cloud's early access program. There's no public API. The commercial partners listed in the report worked directly with the DeepMind team. The github.com/google-deepmind/alphaevolve_results repository contains algorithm outputs but not the system itself.

The open-source community built OpenEvolve as an accessible implementation. It runs smaller models and lacks the compute infrastructure behind the Google version - and that gap is probably as much about GPUs as it's about model capability.

The May 7 report is the first time Google has collected AlphaEvolve's production results in one place. The picture it shows is of a system running quietly across Google for months, producing improvements that are small in percentage terms but significant in absolute scale. Getting access to the system itself remains the bottleneck for anyone outside Google Cloud's partner program.

Sources: