Mistral Medium 3.5 Lands With Cloud Coding Agents and 77.6% on SWE-Bench
Mistral fuses chat, reasoning, and coding into a single 128B dense model and pairs it with Vibe — async cloud-based coding agents that hand back work as pull requests instead of terminal output.
Mistral AI on May 2 released Mistral Medium 3.5, a dense 128-billion-parameter model with a 256k context window that the French lab is positioning as its first true flagship — a single set of weights designed to handle instruction following, long-form reasoning, and coding without forcing developers to swap between specialist models. The release ships alongside Vibe Remote Agents, a new cloud execution layer that pulls coding work off the local machine and returns results as reviewable pull requests.
On benchmarks, Medium 3.5 hits 77.6% on SWE-Bench Verified, ahead of Mistral's own Devstral 2 and Alibaba's Qwen 3.5 397B A17B. It also posts a 91.4 on the τ³-Telecom agentic score, a test that grades reliability across multi-step tool calls. Reasoning effort is now configurable per API request, so a quick chat reply and a long-horizon agent run can use the exact same model with different compute budgets — a setup that mirrors recent moves from OpenAI and Anthropic to expose reasoning depth as a parameter rather than a separate SKU.
Vibe is the more strategically interesting half of the announcement. Coding agents launched from the Mistral Vibe CLI or directly from Le Chat now run asynchronously in sandboxed cloud environments, with full session teleportation from local to cloud and integrations across GitHub, Linear, Jira, Sentry, Slack, and Teams. Mistral is pitching the workflow as: kick off a refactor, dependency upgrade, or CI investigation, walk away, and review the resulting PR when the notification arrives. The same model also powers a new Work mode in Le Chat (preview) for cross-tool agentic tasks across email, calendar, and documents, with explicit approval gates before any sensitive action fires.
API pricing comes in at $1.50 per million input tokens and $7.50 per million output tokens, which puts Medium 3.5 well below GPT-5.5 and Claude Opus 4.7 on cost while undercutting most frontier-tier coding APIs. The model is available through Mistral's Pro, Team, and Enterprise plans, hosted on NVIDIA's build.nvidia.com, and shipped as an NVIDIA NIM container. Open weights are on Hugging Face under a modified MIT license, and Mistral says the model can be self-hosted on as few as four GPUs — an unusually accessible footprint for something competing in the agentic-coding tier.
The release reads as Mistral's clearest answer yet to the question of where it fits in a market dominated by OpenAI, Anthropic, and Google. Rather than chasing the largest possible base model, the lab is bundling a competitive coding model with cloud agent infrastructure and an open license — betting that the developers who care about self-hosting and per-request reasoning control are a market worth owning outright.