Skip to content

two open agent models in one day: one codes, one lives

analysis Two hands reach toward a glowing doorway surrounded by floating PR merge confirmations and restaurant receipts

macaron-v1-preview-749b released today. nex-n2-pro open-sourced today. macaron comes from mindlab research and is post-trained from glm-5.1. nex-n2-pro comes from nex-agi and is post-trained from qwen3.5

two big agent-focused model drops in the same news cycle, and they're aiming at almost completely different problems

macaron is pitched as a personal agent – the kind of model that helps you pick a restaurant, reschedule an errand, or render a comparison card for a booking decision. nex-n2-pro is pitched as a productivity agent – the kind that closes prs, drives a terminal for hours, and runs long-horizon coding loops. macaron leans hard on generative ui and a custom protocol called a2ui; nex-n2-pro leans hard on agentic coding, deep research, and tool calling

here's how they actually stack up


size and architecture

macaron-v1-preview-749b is a 749b-class mixture-of-lora: a 744b base plus five ~1b lora adapters (l0 default, l1–l4 specialists for tool use, coding, computer-agent, generative ui). routing happens via an explicit "router tool" rather than a learned gating network – the harness decides which adapter handles each turn. bfloat16, 202,752-token context, mit license

ModelScope model card for Nex-N2-Pro showing 396.80B params, Apache-2.0 license, and Qwen3.5 MoE base with agentic thinking
source: https://modelscope.ai/models/nex-agi/Nex-N2-Pro

nex-n2-pro is a 396.8b moe with ~17b active params, post-trained on qwen3.5-397b-a17b-base. no adapter routing – one model, one weight set. its "agentic thinking" framework is a training-time framing: adaptive thinking (decide reasoning depth per step) plus coherent thinking (one reasoning style across task types). apache-2.0. a smaller nex-n2-mini (35b moe, 3b active) ships alongside

Hugging Face page for Macaron-V1-Preview-749B from MindLab Research showing 754B params, SWE-bench 78.1, and LoRA adapter architecture
source: https://huggingface.co/mindlab-research/Macaron-V1-Preview-749B

macaron is ~1.9x the total params but uses sparse specialists; nex activates ~17b per token from a smaller pool. macaron bets on interference-avoidance between skill families; nex bets on transfer between them


what each is built for

macaron targets daily-life decisions where state changes between turns – where to eat, how to reroute, scheduling errands. its distinctive capability is a2ui: emitting protocol actions that render as cards, forms, sliders, dashboards instead of text walls. a2ui-bench scores three layers (protocol correctness, task construction, ux lift) with rendered visual checks for overflow and broken layouts

nex-n2-pro targets agentic coding, deep research, tool calling, terminal execution. the framing is closing the loop between requirement understanding, planning, code implementation, environment feedback, debugging, and iteration. no ui generation story


how the new models (macaron & nex-n2-pro) compare

nex-n2-pro is a direct competitor to the top tier. its numbers put it shoulder-to-shoulder with deepseek v4-pro and kimi k2.6 on the benchmarks they share:

Benchmark table comparing Nex-N2-Pro, DeepSeek V4-Pro, Kimi K2.6, GLM-5.1, and MiniMax M3 across SWE-bench and terminal-bench scores

so on the established coding/agent benchmarks, nex-n2-pro is the new leader on terminal-bench 2.1 among open weights (75.3 vs minimax m3's 66.0), basically tied with the field on swe-bench verified and pro, and competitive on reasoning. apache-2.0, clean sglang deploy. it's a legitimate new top-tier option – if the numbers hold up under independent testing, it joins deepseek/kimi/glm/minimax as a first-class agentic-coding pick.

macaron-v1-preview-749b is not really comparable to the rest. it doesn't try to compete on swe-bench pro, terminal-bench, or browsecomp. it's the only model in this whole list specifically post-trained for personal-agent tasks – calendar, restaurants, routing, daily-life decisions – plus generative ui via the a2ui protocol. nothing else on the list does generative ui. so the question isn't "is it better than kimi" but "do you need the thing it does?" if you're building a consumer personal assistant that renders cards and forms, macaron is alone in its category. if you're building a coding agent, ignore it


bottom line

pick nex-n2-pro for code, terminals, repos, or long-running tool chains – better numbers where they overlap, cleaner deploy path, apache-2.0. pick macaron if you need generative ui output or are building a consumer personal-agent product where livingbench/vitabench/pinchbench-style tasks are the actual workload – and you're willing to run their harness. they're not really competing for the same deployment.

Stay in the loop

Get the latest AI news delivered to your inbox weekly

Thanks for subscribing!