opus 4 to 4.7: the quiet shift no one’s pointing at

Nick Trenkler 16 Apr 2026 3 min read

illustration for the article about opus 4.7: the quiet shift no one’s pointing at, showing claude in the crown, the best

the original opus 4 shipped eleven months ago. it already feels obsolete. opus 4.7 is +26 points on terminal coding, +15 on agentic coding, +11 on reasoning. the models aren’t just getting better – the gap between “last year’s frontier” and “this week’s frontier” is starting to look embarrassing.

release rhythm tells story first

with the 2025–2026 releases, opus stopped looking like a slow-moving flagship and started behaving like a living product line: shipping in tighter intervals, getting better at agentic work faster than at classic “reasoning,” and expanding from coding into a broader knowledge-work model.

Claude Opus release timeline table — Opus 4 to 4.7, release dates May 2025–Apr 2026, days between releases

opus 4 launched on may 22, 2025. opus 4.1 followed 76 days later. then 4.5 came after 111 days, 4.6 after 73 days, and now opus 4.7 lands just 70 days later on april 16, 2026.

it’s a ~10-week release loop. opus is being shipped like software.

from model to product line

opus 4 was introduced as the “best coding model,” built around swe-bench, terminal tasks, and long-running agents with early memory capabilities.
opus 4.1 was a classic point release: same positioning, tighter execution – better real-world coding, better agentic search.
opus 4.5 was the pivot. opus stopped being just a coding/reasoning system and became a work model: research, spreadsheets, slides, browser tasks, long workflows.
opus 4.6 expanded that with 1m context, adaptive thinking, agent teams, and stronger performance on finance/legal tasks.
opus 4.7 doesn’t change direction – it stabilizes it. better instruction following, self-verification, higher reliability on long tasks. less supervision needed.

benchmarks tell a different story than you’d expect

Claude Opus 4 vs 4.1 vs 4.5 vs 4.6 vs 4.7 benchmark table — agentic coding, terminal coding, tool use, reasoning

from opus 4 to opus 4.7, the biggest gains were not in classic reasoning, but in agentic and operational tasks:

terminal coding: 43.2% → 69.4%
agentic coding: 72.5% → 87.6%
multilingual qa: 88.8% → 91.5%
graduate-level reasoning: 83.3% → 94.2%

("tool use" isn’t published for 4.7, but previous jumps suggest the same direction.)

what changed is not just the numbers – it’s the gap between them.

agentic and terminal tasks saw the steepest acceleration, while multilingual qa barely moved. reasoning improved meaningfully, but not disproportionately.

the takeaway stays the same, just stronger:

the frontier is no longer “can the model answer harder questions.”
it’s “can it keep working inside messy environments over long periods of time.”

anthropic’s real win isn’t raw intelligence. it’s turning intelligence into reliable execution.

positioning shifted with the capabilities

you can see the transition in how each version was framed:

opus 4: best coding model
opus 4.1: better coding precision
opus 4.5: coding + research + office work
opus 4.6: coding + long-context + economic tasks
opus 4.7: reliable execution on complex work

each step moves further away from benchmarks and closer to workflows. opus improved by becoming less benchmark-theatrical and more workflow-native.

what actually changed

the gap between opus 4 and opus 4.7 isn’t just higher scores.

it’s a structural shift across three layers:

faster release cadence
biggest gains in agentic / operational tasks
broader lifecycle: from elite coder → general-purpose work engine

that’s why the story isn’t “anthropic improved opus.” it’s that opus is no longer a single model. it’s becoming a continuously updated operating layer for work.

opus 4 to 4.7: the quiet shift no one’s pointing at

release rhythm tells story first

from model to product line

benchmarks tell a different story than you’d expect

positioning shifted with the capabilities

what actually changed

Read next

gpt 5.6 sol pro vs claude fable 5 vs grok 4.5 vs glm 5.2

grok 4.5 vs fable 5 vs gpt 5.5 vs glm 5.2

muse image vs gpt image 2 vs nano banana 2 vs reve 2.0

release rhythm tells story first

from model to product line

benchmarks tell a different story than you’d expect

positioning shifted with the capabilities

what actually changed

Stay in the loop

Read next

gpt 5.6 sol pro vs claude fable 5 vs grok 4.5 vs glm 5.2

grok 4.5 vs fable 5 vs gpt 5.5 vs glm 5.2

muse image vs gpt image 2 vs nano banana 2 vs reve 2.0