apple is using google's gemini to distill a smaller model that runs locally on your iphone

Addy Crezee 28 May 2026 1 min read

Google-shirted figure hands documents to a suited Apple executive at a Cloud Down counter in a dark corridor

the information just dropped a piece claiming that, and a lot more

the technique copies capabilities from a trillion-parameter model down into something light enough to run on-device. that's the foundation of their next ai push

• but here's the catch – the full gemini model is too heavy even for apple's own private cloud compute infrastructure. so some siri queries are getting routed to a licensed gemini instance running on google cloud. apple's own servers couldn't handle it

• to keep user data protected while using google's cloud, apple just approved a security feature from nvidia called confidential compute. it basically encrypts everything – your query, the ai model processing it – while it's all happening on nvidia's chips. there's a small slowdown as a tradeoff, but it means apple can still claim your data is safe even when it's being processed on someone else's servers

• they're also shopping for acquisitions. liquid ai, a cambridge-based startup focused on efficient on-device inference, is reportedly on their radar. no deal closed yet

wwdc is when we get the feature layer on top of all this. the infra story is already leaking out

Screenshot of thehype headline "apple is distilling gemini" over a news article about Apple's on-device AI push with a paper craft iPhone

apple is using google's gemini to distill a smaller model that runs locally on your iphone

Read next

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

qwen image 3.0 vs gpt image 2

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3

Stay in the loop

Read next

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

qwen image 3.0 vs gpt image 2

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3