Inside GenAI.mil: 25 Pentagon "Power Users" Are Bake-Offing OpenAI, Google and Grok Against Claude
Bloomberg pulled back the curtain on the DoD’s replacement search for Claude this week: a hand-picked group of 25 military "power users," a new platform called GenAI.mil that sits independent of Maven Smart System, and a six-month transition deadline for every contractor that built on Anthropic. Frontier models from OpenAI, Google and xAI are now being graded inside classified workflows that Claude used to own.
For most of the past year, Anthropic’s Claude was the Pentagon’s favorite frontier model — a $200 million ceiling contract signed in July 2025, classified integrations across multiple commands, and a quiet reputation as the only safety-conscious lab willing to actually ship to defense customers. That deal is now effectively dead, and Bloomberg’s May 21 reporting — picked up this week by The Star, Investing.com and others on May 25 — finally puts numbers on what the bake-off to replace it actually looks like inside the building.
The testing runs through a platform called GenAI.mil, which the Department of Defense stood up to evaluate frontier models independently of the existing Maven Smart System. Twenty-five hand-picked “power users” — analysts, mission planners and operators drawn from across the services — are running the same classified workloads through models from OpenAI, Google DeepMind and xAI’s Grok line, scoring each on the kind of tasks that previously sat behind Claude. The exercise kicked off March 1, three days after Defense Secretary Pete Hegseth formally designated Anthropic a “supply-chain risk” on February 27.
The trigger for the split is not technical — it is policy. Anthropic refused to lift its acceptable-use restrictions barring Claude from mass surveillance and lethal autonomous weapons targeting. Pentagon officials wanted those guardrails off for specific national-security and domestic-surveillance workloads; Anthropic, founded specifically as a safety-focused alternative to OpenAI, declined. Hegseth’s supply-chain-risk designation was the formal mechanism for getting Anthropic out of the procurement stack, and every contractor that had wired Claude into a delivered system was put on a six-month clock to find a replacement.
Inside the bake-off, the dynamics are telling. OpenAI and Google were already on the defense side of the fence after their May Pentagon contracts and the CAISI pre-deployment testing pacts. xAI is the newer entrant, with Grok 4.3’s 1M-token window and aggressive pricing making it attractive on cost-per-task even before the safety conversation comes up — exactly the angle Elon Musk has been pushing in defense circles for months. The 25 power users are not asked to pick a single winner; the goal is to identify which model best replaces Claude in which workflow, and to do it inside the six-month transition window so deployed systems do not fall over when Anthropic licenses expire.
Anthropic is not going quietly. The company has filed suit challenging the supply-chain-risk designation, arguing it could cost billions in lost federal revenue and was applied without due process. The deeper bet is that a future administration — or a different Defense Secretary — reverses the call once the operational pain of moving off Claude becomes obvious. For now, though, the message from GenAI.mil is the one every frontier lab has spent two years trying to avoid: in a fight between safety policy and defense procurement, defense procurement gets to pick its model, and the model is no longer Claude.
Comments
Share your thoughts. Be kind.
Loading comments…