Skip to content

google launches gemma 4 12b – nearly matches the 26b model on benchmarks, sometimes beats it, at less than half the memory footprint

pulse Surgeon cracking open a chip labeled LLM Backbone with Vision and Audio modules wired in, monitor showing 12B


what changed under the hood:

vision. replaced the encoder with a lightweight embedding module (single matrix multiply + positional embedding + normalization). the llm backbone now handles visual processing directly

• audio. encoder removed entirely. raw audio signal is projected straight into the same token space as text

• inference. ships with multi-token prediction (mtp) drafters for speculative decoding, cutting latency

benchmarks (gemma 3 27b / gemma 4 12b / gemma 4 26b):

- gpqa diamond: 44 / 78.8 / ~80
- bbeh: 18 / 53 / 62 mmlu pro: 67 / 77.2 / 78
- livecodebench: 28 / 72 / 76
- docvqa: 83 / 94.9 / 93
- infovqa: 60 / 88.4 / 90
- mmmu pro: 65 / 69.1 / 72

runs locally on consumer laptops with 16gb vram or unified memory – including macbook m-series

0:00
/0:41

demo source: google

Stay in the loop

Get the latest AI news delivered to your inbox weekly

Thanks for subscribing!