google launches gemma 4 12b – nearly matches the 26b model on benchmarks, sometimes beats it, at less than half the memory footprint

Addy Crezee 3 Jun 2026 1 min read

Surgeon cracking open a chip labeled LLM Backbone with Vision and Audio modules wired in, monitor showing 12B

what changed under the hood:

• vision. replaced the encoder with a lightweight embedding module (single matrix multiply + positional embedding + normalization). the llm backbone now handles visual processing directly

• audio. encoder removed entirely. raw audio signal is projected straight into the same token space as text

• inference. ships with multi-token prediction (mtp) drafters for speculative decoding, cutting latency

benchmarks (gemma 3 27b / gemma 4 12b / gemma 4 26b):

- gpqa diamond: 44 / 78.8 / ~80
- bbeh: 18 / 53 / 62 mmlu pro: 67 / 77.2 / 78
- livecodebench: 28 / 72 / 76
- docvqa: 83 / 94.9 / 93
- infovqa: 60 / 88.4 / 90
- mmmu pro: 65 / 69.1 / 72

runs locally on consumer laptops with 16gb vram or unified memory – including macbook m-series

0:00

/0:41

demo source: google

Gemma 4 12B delivers great performance with a small memory footprint and a novel architecture. pic.twitter.com/aUyX7lzj9f
— Google (@Google) June 3, 2026

google launches gemma 4 12b – nearly matches the 26b model on benchmarks, sometimes beats it, at less than half the memory footprint

Read next

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

qwen image 3.0 vs gpt image 2

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3

Stay in the loop

Read next

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

qwen image 3.0 vs gpt image 2

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3