unsloth made qwen 3.6 27b run locally at 2x speed on just 18gb ram

Nick Trenkler 18 May 2026 1 min read

Masked technician works on a GPU circuit board; monitor shows 50 crossed out replaced by 110, second screen reads GPU memory bar 54GB to 17GB

from 50–70 tok/s to 75–110 tok/s – tokens per second is basically how fast the model types back at you, one token being roughly one word. here's how they pulled it off:

1. dynamic quantization. instead of shrinking every part of the model equally, unsloth figured out which weights matter most and kept those at higher precision. the result is a Q4_K_XL file that's only 17.9gb – that's a 27 billion parameter model compressed from its original 54.7gb BF16 size, small enough to fit in a single consumer gpu with 18gb vram

2. multi-token prediction (mtp). normally a model predicts one token at a time. with mtp, qwen 3.6 was trained to draft multiple tokens ahead simultaneously, and llama.cpp just merged official support for this on may 16th. unsloth was ready day one

those two things together – a model small enough to fit in vram, running with speculative decoding baked in – is why you're getting near 2x throughput without touching model quality

models are being compressed by both the open-source community and major providers – and the pace is accelerating. the end goal is clear: llms small enough to run on a consumer laptop or smartphone, no api calls, no cloud dependency, just local inference baked directly into any app

Line chart showing Qwen3.6-27B throughput rising from ~60 tok/s with no MTP to 110+ tok/s with MTP enabled across quant sizes

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s.

GGUFs: https://t.co/7gWhKnseZo
Guide: https://t.co/7qzk6ypWDQ pic.twitter.com/8ICXw7iV3G
— Unsloth AI (@UnslothAI) May 18, 2026

unsloth made qwen 3.6 27b run locally at 2x speed on just 18gb ram

Read next

gpt 5.6 sol pro vs claude fable 5 vs grok 4.5 vs glm 5.2

grok 4.5 vs fable 5 vs gpt 5.5 vs glm 5.2

muse image vs gpt image 2 vs nano banana 2 vs reve 2.0

Stay in the loop

Read next

gpt 5.6 sol pro vs claude fable 5 vs grok 4.5 vs glm 5.2

grok 4.5 vs fable 5 vs gpt 5.5 vs glm 5.2

muse image vs gpt image 2 vs nano banana 2 vs reve 2.0