Skip to content

gpt-5.5 is so much better than opus 4.7 that it can challenge mythos

pulse Woman in a police lineup with "GPT-5.5" highlighted in neon green above Opus 4.7, Gemini 3.1, GPT-5.4 — public model on top.


- top-1 on artificial analysis intelligence index, above opus 4.7, gemini 3.1, and gpt-5.4
- beats opus 4.7 on almost every benchmark. one exception: swe-bench pro (57.7% vs opus 4.7's 64.3%)
- edges out mythos on terminal bench 2.0 by 0.7pp
- completed a uk aisi cyber attack simulation end-to-end: 32 steps, ~20 hours for a human expert. but only 1 out of 10 attempts. mythos: 3/10 on the same sim.

strong as mythos, public as gpt

Bar chart comparing GPT-5.5 and Claude Mythos: terminal-bench 82.7 vs 82.0, OSWorld 78.7 vs 79.6, BrowseComp 84.4 vs 86.9, CyberGym 81.8 vs 83.1.

Stay in the loop

Get the latest AI news delivered to your inbox weekly

Thanks for subscribing!