Industry·2 min read·HPCwire

Commerce's CAISI Signs Pre-Deployment Frontier AI Testing Pacts With Google DeepMind, Microsoft, and xAI

The Center for AI Standards and Innovation will get pre-release access to frontier models from Google DeepMind, Microsoft, and xAI — including versions with safeguards stripped — to evaluate cyber, bio, and chemical risk.

Commerce's CAISI Signs Pre-Deployment Frontier AI Testing Pacts With Google DeepMind, Microsoft, and xAI
Share:

The Center for AI Standards and Innovation (CAISI), the AI evaluation arm housed inside the Commerce Department's National Institute of Standards and Technology, announced new frontier AI testing agreements on May 5, 2026 with Google DeepMind, Microsoft, and xAI. Under the deals, all three labs will give the U.S. government pre-deployment access to their highest-capability models for security and capability evaluations, with additional assessments continuing after public release.

The agreements broaden the scope of what CAISI can probe. Evaluators will assess "demonstrable risks" tied to national security — explicitly including cybersecurity, biosecurity, and chemical-weapons capabilities — and will be allowed to test in classified environments. To make those tests meaningful, the developers have agreed to provide model variants with reduced or removed safety guardrails, the same approach that government red-teamers have argued is necessary to surface ceiling-level capabilities rather than the post-mitigation behavior end-users see.

"These expanded industry collaborations help us scale our work in the public interest at a critical moment," CAISI Director Chris Fall said in a statement. The center has now completed more than 40 evaluations, and its remit was reset earlier this year to align with Commerce Secretary Howard Lutnick's directives and the administration's AI Action Plan, which leans more heavily on national security framing than the framework CAISI's predecessor body operated under in 2024.

The new arrangement updates partnerships that the U.S. AI Safety Institute, CAISI's predecessor, signed with OpenAI and Anthropic in August 2024. Those original deals were focused on voluntary safety testing of frontier models. The renegotiated terms with Google DeepMind, Microsoft, and xAI fold in security and capability assessment work that CAISI now treats as a formal pre-deployment gate, even though compliance remains voluntary in the absence of federal AI legislation.

For the labs, the deals are a hedge. Pre-deployment evaluations let them get ahead of likely regulatory scrutiny without conceding to a binding licensing regime, while also giving CAISI a clearer picture of which capabilities — autonomous cyber operations, dual-use biology, agentic tool use — are crossing thresholds that warrant new policy. Anthropic, notably, has been pushing for stronger formal oversight, and OpenAI's existing arrangement remains in place; the May 5 announcement explicitly positions the new agreements as additions to, not replacements for, the broader frontier testing program.

Comments

Share your thoughts. Be kind.

0/2000

Loading comments…

Related Articles

FERC SHOW-CAUSE ORDERS · JUNE 18, 2026 SPEED TO POWER Six grid operators told to fast-track AI data centers POWER GRID AI DATA CENTER 30 DAYS reliability report on capacity 60 DAYS justify or reform large-load tariffs BITSMINDS.COM
Industry

FERC Orders Grid Operators to Fast-Track AI Data Centers

0*:=/0*:=/#0*:=1#0*:=1#0*:= /%1#0*/%1#0*=/%1#0=/%1#0=/%1#0 AGENTIC RESOURCE DISCOVERY ai-catalog.jsonMCP serverA2A agentOpenAPI tool publish · discover · verify — across every framework BITSMINDS.COM
Industry

Google's Open Standard Is a Search Engine for AI Agents

EXPORT CONTROLS · ANTHROPIC JUN 19 Fable 5’s ban has a name behind it WIRED names the Korean carrier whose Mythos access the White House cut over alleged China ties. SK TELECOM Korea’s largest carrier $100M Anthropic investor CLAUDE MYTHOS Anthropic’s 10T-param cyber model (unreleased) PROJECT GLASSWING access revoked · Jun 12 Access cut days before the Jun 12 directive pulled Mythos and Fable 5 offline. SK Telecom denies any ties to China. BITSMINDS.COM Source: WIRED · Korea JoongAng Daily
Industry

Fable 5’s Ban Has a New Name Behind It: SK Telecom