musk admits xai trained grok on openai models in court

Addy Crezee 1 May 2026 2 min read

Suited man stands over a glowing circuit board in a dark courtroom as a hand signs an xAI v. OpenAI document beside it.

elon musk just admitted in federal court that xai trained grok using openai's models. his defense? "all ai companies do it."

he's not wrong. the practice is called model distillation – and it's quietly become the most controversial technique in the ai industry

here's how it works, who got caught, and why nobody is clean

what is it? a smaller "student" model learns from a larger "teacher" model's outputs – ending up nearly as capable, at a fraction of the cost. instead of training on raw human-labeled data, the student simply learns to mimic how the teacher responds. done with permission, it's a standard technique. done covertly against a competitor, it's where things get ugly

how is it done? you query the teacher at scale with thousands or millions of prompts, collect the outputs, and train the student on those pairs. the most powerful version captures full chain-of-thought reasoning – not just answers, but the step-by-step thinking behind them. the student doesn't copy answers. it absorbs a way of thinking

how many examples do you need? for narrow tasks – tone, formatting, domain-specific responses – 50k is fine. stanford's alpaca proved this in 2023 with 52k gpt-3 outputs: decent instruction-following, but noticeably weaker reasoning than the teacher.
for genuine capability transfer, you need millions. and quality beats quantity: 920 carefully chosen reasoning traces can outperform brute-force datasets many times larger. deepseek's alleged campaign ran 16 million+ queries. that's the real scale.

what does it cost? 50k examples from gpt-5.5 runs $75–150 via the batch api (openai's async mode, 50% off in exchange for 24hr turnaround).
gpt-5 nano brings that to $3.63. the real bill is training the student afterward – thousands to tens of thousands depending on size.
open-source teachers like deepseek v4, kimi k2.6, and qwen 3.6 plus are free to download – you only pay for gpu compute: $15-45 on rented hardware, or just electricity if you run it on your own machine

is it legal? nobody knows yet. every major lab bans it in their terms of service. no court has ruled. in april 2026, the white house issued memorandum nstm-4 formally designating large-scale distillation attacks as a national security threat

who did it? microsoft detected large-scale data extraction from openai accounts linked to deepseek in late 2024. openai says it has evidence. deepseek has never confirmed or denied it.
moonshot ai and minimax were accused by anthropic in early 2026 of running 16M+ queries across 24,000 fake accounts using vpn rotation and jailbreak prompts.
and now musk has confirmed xai did it too – at least partly. this is industry-wide

the counterpoint – mistral's magistral hit competitive benchmarks in 2025 through pure reinforcement learning, zero distillation. the honest path exists – just harder

the irony: the companies crying foul trained their own models on scraped internet data without consent. the debate over whose copying is legitimate is very much still ongoing