Models·4 min read·Microsoft AI / TechTimes

Microsoft Aims MAI-Thinking-1 Straight at Claude: a 35B Reasoning Model It Says Beats Sonnet 4.6 and Matches Opus 4.6 on Code

At Build 2026, Microsoft’s AI Superintelligence Team unveiled MAI-Thinking-1, its first in-house reasoning model — a sparse Mixture-of-Experts design with 35B active parameters (~1T total) and a 256K context window. Microsoft says human raters prefer it to Claude Sonnet 4.6 and that it matches Opus 4.6 on the SWE-Bench Pro coding benchmark, and it pointedly trained the model from scratch with zero distillation on commercially licensed data — no OpenAI involved.

BUILD 2026 · MICROSOFT TAKES AIM AT CLAUDE LOCKED ON MICROSOFT AI MAI-Thinking-1 FIRST IN-HOUSE REASONING MODEL 35B active · ~1T MoE · 256K ctx trained with zero distillation Claude ENTERPRISE CODING DEFAULT SWE-Bench Pro ≈ 53% — level with Opus 4.6 BITSMINDS.COM Microsoft's claims, pending independent benchmarks
Share:

At its Build 2026 developer conference, Microsoft’s AI Superintelligence Team unveiled MAI-Thinking-1, the company’s first in-house reasoning model — and it did not pick a subtle benchmark to plant its flag against. The pitch is explicitly framed around Anthropic’s Claude. MAI-Thinking-1 is a sparse Mixture-of-Experts model with roughly 35 billion active parameters out of about one trillion total, paired with a 256,000-token context window that Microsoft says is enough to read a 600-page document in a single pass.

The headline numbers are aimed at the parts of the market Claude currently owns. Microsoft says independent human evaluations run by Surge preferred MAI-Thinking-1 to Claude Sonnet 4.6 in blind tests, and that on SWE-Bench Pro — one of the hardest agentic coding benchmarks — it scores around 53%, putting it level with Claude Opus 4.6. On the math and multi-step reasoning side it posts 97.0% on AIME 2025 and 94.5% on AIME 2026. The obvious caveat: these are Microsoft’s own figures, and no independent peer review has landed yet, so the comparisons remain claims rather than confirmed results.

The training story is as much a part of the message as the scores. Microsoft says MAI-Thinking-1 was trained from scratch “with zero distillation on enterprise grade, clean and commercially licensed data” — and, pointedly, without any data distilled from third-party models, including OpenAI’s GPT series. For a company whose AI strategy has been synonymous with OpenAI for years, that line is doing strategic work: it lets Microsoft sell enterprises a model with a clean, auditable data provenance that does not depend on a partner-slash-rival.

MAI-Thinking-1 is available now in private preview on Microsoft Foundry, with function calling, multi-layered instruction following and Chat Completions API compatibility so existing code can target it with minimal changes. It is one of seven new in-house models Microsoft showed at Build — alongside MAI-Image-2.5, MAI-Transcribe 1.5, MAI-Voice-2 and the MAI-Code-1 coding model — and Microsoft stressed that “developer choice doesn’t stop at our catalog,” noting the MAI models are also offered through Fireworks AI, Baseten and OpenRouter.

Strategically, this is the clearest signal yet of Microsoft’s shift from OpenAI reseller to multi-vendor platform. The company is not walking away from OpenAI — the partnership and the GPT models remain front and center in Foundry — but it now wants to own a credible frontier-grade option of its own, sitting inside Azure’s governance and security stack, so that enterprise developers have genuine model choice without leaving the platform. It dovetails with the rest of Microsoft’s Build narrative: an agent-first runtime, a control plane for orchestrating agents, and increasingly its own silicon to run them on.

Why aim at Claude specifically? Because in the enterprise, Claude — and Claude Code in particular — has become the default for serious coding and agentic work, exactly the high-value workloads Microsoft most wants flowing through Foundry rather than out to a competitor. A 35B-active model that can credibly claim Opus-4.6-class coding at a fraction of the token cost is a direct attempt to undercut that default on price-performance. Whether the claim survives contact with independent benchmarks is the question everyone — Anthropic included — will be waiting to see answered.

Comments

Share your thoughts. Be kind.

0/2000

Loading comments…

Related Articles

OPEN SOURCE · MIXTURE-OF-EXPERTS · APACHE 2.0JETBRAINS MELLUM2 · JUN 2Mellum212B total parameters · ~2.5B active per tokenMoE · 131K context · 2x faster inference6 variants · Base · Instruct · Thinking (RLVR)THE FOCAL-MODEL THESISFast, specialized parts orchestrated by frontier modelsROUTER · 8 OF 64 EXPERTS FIREOnly ~21% of parameters fire per tokenBITSMINDS.COMSource: JetBrains AI Blog · Hugging Face
Models

JetBrains Open-Sources Mellum2, a 12B Mixture-of-Experts Model Built to Be a Fast "Focal" Part, Not a Frontier Rival

Microsoft Will Unveil Its Own GitHub Copilot Coding Model at Build — a Direct Shot at Claude Code
Models

Microsoft Will Unveil Its Own GitHub Copilot Coding Model at Build — a Direct Shot at Claude Code

FRONTIER MODEL SHOWDOWN · WHO WINS? Three labs. Three strongest models. One fight. ANTHROPIC Claude Opus 4.8 AUTONOMY OPENAI GPT-5.5 “Spud” AGENTS GOOGLE Gemini 3.1 Ultra REASONING VS VS BITSMINDS.COM BitsMinds original analysis
Models

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Ultra: The Benchmark-by-Benchmark Comparison