opus 4.8 beats mythos on agentic computer use

Nick Trenkler 28 May 2026 1 min read

Illustration of a man in a lab coat labeled Opus 4.8 writing benchmark scores in a book marked OSWorld in a server room

osworld-verified bench tests ai on real computer tasks – opening apps, filling forms, navigating browsers. each task is pass/fail. the % is simply how many tasks the model completed successfully out of the total

83.4% vs 79.6% – a nearly 4% gap on pass/fail tasks isn't noise. it's a pattern

public models are closing the gap faster than anyone expected. if anthropic doesn't open mythos to the mass market soon, the window closes

and according to the opus 4.8 release notes – it already is. mythos goes public in weeks

Benchmark table comparing Opus 4.8 and Mythos on agentic coding, multidisciplinary reasoning, and agentic computer use scores

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

Available today at the same price. pic.twitter.com/EufxL7T1kb
— Claude (@claudeai) May 28, 2026

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

@poolsideai released laguna s 2.1 on july 21 – a new open-weights agentic coding model, full weights on hugging face. key facts: • 118b total / 8b active moe, 1m context,

24 Jul 2026 3 min read

Comic-style chef in a neon kitchen holding a plate labeled "rosted fennel" and "thryme" — Qwen's misspelled menu

qwen image 3.0 vs gpt image 2

@Alibaba_Qwen just dropped qwen-image-3.0 – the third gen of their image model. the whole pitch is going from "good-looking" to "useful" (their

23 Jul 2026 2 min read

Comic-style architects study glowing blueprints beside neon tower holograms labeled Qwen, Kimi and Gemini in a graffiti studio

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3

@OfficialLoganK and @GoogleAIStudio recently shipped gemini 3.6 flash – their new workhorse. key facts: • 17% fewer output tokens than 3.5 flash on the artificial analysis index, up to 65%

22 Jul 2026 3 min read

Stay in the loop

Read next

laguna s 2.1 vs hy3 vs inkling vs deepseek v4 pro max

qwen image 3.0 vs gpt image 2

gemini 3.6 flash vs qwen3-max vs gpt 5.6 sol vs kimi k3