elon musk just admitted in federal court that xai trained grok using openai's models. his defense? "all ai companies do it."
he's not wrong. the practice is called model distillation – and it's quietly become the most controversial technique in the ai industry
here's how it works, who got caught, and why nobody is clean
what is it? a smaller "student" model learns from a larger "teacher" model's outputs – ending up nearly as capable, at a fraction of the cost. instead of training on raw human-labeled data, the student simply learns to mimic how the teacher responds. done with permission, it's a standard technique. done covertly against a competitor, it's where things get ugly
how is it done? you query the teacher at scale with thousands or millions of prompts, collect the outputs, and train the student on those pairs. the most powerful version captures full chain-of-thought reasoning – not just answers, but the step-by-step thinking behind them. the student doesn't copy answers. it absorbs a way of thinking
how many examples do you need? for narrow tasks – tone, formatting, domain-specific responses – 50k is fine. stanford's alpaca proved this in 2023 with 52k gpt-3 outputs: decent instruction-following, but noticeably weaker reasoning than the teacher.
for genuine capability transfer, you need millions. and quality beats quantity: 920 carefully chosen reasoning traces can outperform brute-force datasets many times larger. deepseek's alleged campaign ran 16 million+ queries. that's the real scale.
what does it cost? 50k examples from gpt-5.5 runs $75–150 via the batch api (openai's async mode, 50% off in exchange for 24hr turnaround).
gpt-5 nano brings that to $3.63. the real bill is training the student afterward – thousands to tens of thousands depending on size.
open-source teachers like deepseek v4, kimi k2.6, and qwen 3.6 plus are free to download – you only pay for gpu compute: $15-45 on rented hardware, or just electricity if you run it on your own machine
is it legal? nobody knows yet. every major lab bans it in their terms of service. no court has ruled. in april 2026, the white house issued memorandum nstm-4 formally designating large-scale distillation attacks as a national security threat
who did it? microsoft detected large-scale data extraction from openai accounts linked to deepseek in late 2024. openai says it has evidence. deepseek has never confirmed or denied it.
moonshot ai and minimax were accused by anthropic in early 2026 of running 16M+ queries across 24,000 fake accounts using vpn rotation and jailbreak prompts.
and now musk has confirmed xai did it too – at least partly. this is industry-wide
the counterpoint – mistral's magistral hit competitive benchmarks in 2025 through pure reinforcement learning, zero distillation. the honest path exists – just harder
the irony: the companies crying foul trained their own models on scraped internet data without consent. the debate over whose copying is legitimate is very much still ongoing

Elon Musk confirms xAI used OpenAI’s models to train Grok https://t.co/SR2TF28215
— The Verge (@verge) April 30, 2026
Addy Crezee