Skip to content

gpt-5.5 hits iq 136: new benchmark ranks 40+ ai models against humans

pulse Man in suit extends an offer letter across a desk toward a server rack labeled IQ 136, resume stack discarded in a bin beside him

new benchmark shows why you better pay for ai than for human workforce

on a standard human iq bell curve, 50% of people score below 100. only 15% reach above 115 – the threshold typically associated with professional and academic high achievers. and above 130? that's roughly 2% of the population

ai iq, a benchmark site that launched 2 days ago, plotted 40+ of today's most popular ai models on that same curve. the numbers don't flatter us:

• only 20% of models score below 85 – mostly older or smaller models like gpt-4-turbo, opus-3, and grok-2
• 17% sit in the human-average zone of 85-100
• 32% land in the 100-115 range – above average by human standards
• roughly 30% score above 115, a tier only 15% of humans ever reach

and then there's the top of the leaderboard, where things get genuinely uncomfortable to look at:

1. gpt-5.5 – 136 iq
2. gpt-5.4 – 134 iq
3. gemini-3.1-pro – 132 iq
3. opus-4.7 – 132 iq
4. opus-4.6 – 131 iq
5. gemini-3-pro – 126 iq
5. gpt-5.2 – 126 iq
6. grok-4.3 – 125 iq
7. grok-4.20 – 123 iq
7. opus-4.5 – 123 iq
8. kimi-k2.6 – 122 iq
9. gpt-5.1 – 120 iq
10. gpt-5 – 119 iq
11. kimi-k2.5 – 118 iq
12. muse-spark – 117 iq
12. deepseek-v4-pro – 117 iq
13. glm-5.1 – 115 iq
14. minimax-m2.7 – 114 iq

a few things jump out:

• openai is absolutely dominating the top – gpt-5.5 and gpt-5.4 sit at 136 and 134, and they have four more models in the top 16. no other lab comes close in terms of sheer volume at the top
• anthropic and google are in a dead heat for second place – opus-4.7 and gemini-3.1-pro both land at 132, with opus-4.6 just behind at 131 and gemini-3-pro at 126
• xai is punching above its weight – grok-4.3 at 125 and grok-4.20 at 123 put elon's lab ahead of most of the field, which wasn't the narrative a year ago
• the chinese labs are closing the gap fast – kimi-k2.6 at 122, deepseek-v4-pro at 117, glm-5.1 at 115, minimax-m2.7 at 114. none of them are at the very top yet, but they're clearly no longer trailing by a wide margin
• muse-spark is a surprise – meta's model at 117 is genuinely competitive, quietly sitting above several well-known western models
• the gap between first and last on this list is only 22 iq points – which sounds small until you remember that on a human scale, 22 points is the difference between average and near-genius

for context, an iq of 130 in humans means you're smarter than 98% of the population. these models aren't just nudging past that line – they're stacking dimension scores well above it across the board. and unlike a human genius, they're available via api, right now, at a few dollars per task

it's already hard to compete with ai on raw intelligence. it knows more, reasons faster, and doesn't have bad days. but look at the trajectory – the frontier has moved from roughly 100 to 136 in under two years. if that pace continues, the gap won't just be uncomfortable. it will be impossible to close

Bell curve chart comparing human and AI IQ distributions: 64% of AI models exceed IQ 100 vs 50% of humans; source aiiq.org, 42 models

Stay in the loop

Get the latest AI news delivered to your inbox weekly

Thanks for subscribing!