Musk Admits xAI Distilled OpenAI Models for Grok

On April 30, under cross-examination in a California federal courtroom, Elon Musk acknowledged that xAI "partly" used OpenAI's models to train Grok. Some people in the courtroom gasped.

The admission dropped during week one of the Musk v. Altman trial, where Musk is suing OpenAI, Sam Altman, and Greg Brockman for abandoning the organization's nonprofit mission. When opposing counsel asked whether xAI had used distillation techniques on OpenAI models, Musk first called it "a general practice among AI companies" before confirming directly: "Partly."

That single word is now reverberating across the industry. Distillation isn't news to anyone building AI systems - it's been an open secret for years. The issue is that the same U.S. labs accusing foreign competitors of running "industrial-scale" model theft have apparently been doing it to each other.

TL;DR

The claim: Musk says distillation is "standard practice" across the AI industry and xAI only did it "partly"
What we found: OpenAI's terms of service explicitly ban using outputs to train competing models; Anthropic has called the same practice theft when Chinese companies did it
What it means for developers: The API ecosystem sits on a legal grey zone that nobody at the frontier has clean hands on

What They Showed

The technical term "distillation" in AI means something specific. You query a deployed model through its public API, collect its outputs at scale, then use those input-output pairs as training data to teach a new model to behave similarly. It's distinct from model compression (shrinking one model into a smaller version of itself) - this is copying behavior across company lines.

Musk's justification was that xAI uses this to "validate" its models - checking whether Grok produces outputs similar to other frontier systems. That's a real and legitimate use case. Evaluators do it routinely. The problem is that creating millions of validation samples at the API level and feeding them into a training pipeline is indistinguishable from extraction.

When pressed by OpenAI's legal team, Musk didn't claim xAI stopped at validation. He said "partly." The implication is that at minimum some portion of Grok's training data came from OpenAI API outputs. How much, and in which training runs, wasn't established in court.

The Ronald V. Dellums Federal Building in Oakland - site of the Musk v. Altman trial where Musk took the stand on April 30, 2026. Source: commons.wikimedia.org

What the Terms Actually Say

OpenAI's services agreement prohibits users from using outputs "to develop models that compete with OpenAI." Anthropic's usage policy has similar language. So does xAI's own Grok developer agreement.

That's not a coincidence. Every major lab added these clauses because distillation is genuinely threatening. The Anthropic complaint against DeepSeek, MiniMax, and Moonshot AI - which named 24,000 fraudulent accounts generating 16 million exchanges with Claude - described the same technical act Musk just admitted to, and called it theft.

What the ToS says	Who wrote it	Who just admitted to doing it
"Don't use outputs to train competing models"	OpenAI	xAI (Musk, under oath)
"Don't extract capabilities through systematic querying"	Anthropic	xAI, by implication
"Prohibited reverse engineering via API"	Most frontier labs	The practice Musk called "standard"

The legal status remains murky. IP law doesn't cleanly cover this. Violating terms of service is a civil matter, not a criminal one, and courts haven't yet ruled on whether distillation constitutes theft of model capabilities versus legitimate use of a publicly available API.

The Gap Between Rhetoric and Practice

This is where it gets uncomfortable. For most of 2026, U.S. AI labs have been running a coordinated narrative: Chinese companies are stealing American AI capabilities through distillation, and this is a national security threat.

The White House said exactly that. OpenAI accused DeepSeek of using its outputs without permission. US AI labs began sharing distillation threat intelligence through the Frontier Model Forum, framing it as a foreign adversary problem.

Now Musk's testimony puts a crack in that framing. If xAI - a U.S. company, staffed by ex-OpenAI researchers, building a direct competitor to ChatGPT - also trains on OpenAI outputs, then the line between "theft" and "standard practice" depends heavily on who's doing it and who's doing the complaining.

Musk ranked the AI field under cross-examination as: Anthropic first, OpenAI second, Google third, with Chinese open-source models somewhere in the mix. He placed xAI as a smaller player with a few hundred employees. That framing makes xAI's distillation sound like a scrappy underdog move. It's a harder case to make if you're simultaneously suing the company you admit you learned from.

Grok 4.20 trails Gemini and GPT-5.4 by a wide margin in intelligence benchmarks Grok 4.20 scored 49 on the Artificial Analysis Intelligence Index, behind GPT-5.4 and Gemini 3.1 Pro, despite the distillation advantage Musk alluded to. Source: the-decoder.com

What Distillation Actually Costs at Scale

Reproducing frontier model behavior through API distillation isn't trivial. The compute cost is in the query side, not the training side. A rough breakdown:

Component	Requirement	Estimated cost
Query volume	10M - 100M prompts at scale	$50K - $500K in API spend
Data processing	Filtering, deduplication, formatting	Minimal (infrastructure)
Training compute	Fine-tuning or pre-training from scratch	Separate from distillation
Legal exposure	ToS violation, potential civil action	Unquantified

The economics explain why distillation is tempting. You can reproduce months of expensive frontier training for a fraction of the compute cost, as long as you already have a base model to fine-tune.

For developers building on AI APIs, this creates a concrete risk: if a model you're using has partially been trained on outputs from another model you're also using, those two systems are less independent than their vendors claim. Benchmark comparisons between them are partly circular.

Verdict

The honest engineering read is that Musk's "partly" admission is probably true for more companies than just xAI. API-based validation runs that shade into training data collection are a known shortcut in model development. The industry has just agreed, implicitly, not to say so.

What's changed is that a CEO said it under oath in federal court, in a case where OpenAI is one of the parties. That changes the legal record even if it doesn't change the technical reality.

Grok's current capabilities don't tell us much about which specific training runs involved distillation or how much it contributed. The "partly" qualifier was strategic. But the underlying point stands: the U.S. AI ecosystem can't simultaneously call distillation theft from foreign adversaries and shrug when an American lab admits to the same technique against American competitors.

Judge Yvonne Gonzalez Rogers, who's overseeing the case, already expressed skepticism about Musk's motivations, stating she suspects "there's plenty of people who don't want to put the future of humanity in Mr. Musk's hands." The trial continues. Week two begins May 5.

Sources:

Elon Musk testifies that xAI trained Grok on OpenAI models - TechCrunch
Musk v. Altman week 1: Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models - MIT Technology Review
Elon Musk Says xAI Used OpenAI Models to Train Grok - Decrypt
Musk Admits xAI Used OpenAI Models to Build Grok AI - NewsBytes
Musk's "Partly" Admission Puts AI Distillation in the Spotlight - TechStory