GPT-5.6 Is Here: OpenAI Ships Sol, Terra and Luna
OpenAI released GPT-5.6 on June 27 as a three-model family — the flagship Sol, the balanced Terra and the fast, cheap Luna — but only as a limited preview to about 20 U.S. government-approved partners. Sol leads agentic-coding benchmarks (91.9% on Terminal-Bench 2.1) and matches prior security results using roughly a third of the output tokens, with general availability promised in the coming weeks.
OpenAI released GPT-5.6 on June 27, ending weeks of leaks and prediction-market bets — but in an unusually constrained way. Instead of a single flagship, the company shipped a three-model family: Sol, its most capable model, alongside the lower-cost Terra and the fast, cheap Luna. And rather than opening it to everyone, OpenAI is running a limited preview for roughly 20 partner organizations whose access was approved by the U.S. government on national-security grounds.
The three tiers are pitched at different jobs. Sol is built for the hardest work — long-horizon coding, security research and agentic tasks — at $5 per million input tokens and $30 per million output. Terra targets high-volume production work like customer support, internal tools and document analysis at roughly half the cost, $2.50 / $15, which OpenAI frames as about 2× cheaper than GPT-5.5. Luna, at $1 / $6, is for routine, latency-sensitive jobs such as summarizing, drafting and automation.
On benchmarks, Sol's headline result is agentic coding. In OpenAI's figures, Sol running in its new ultra mode scores 91.9% on Terminal-Bench 2.1 — ahead of plain Sol (88.8%) and Anthropic's Claude Mythos 5 (88.0%), with even the mid-tier Terra (84.3%) and budget Luna (82.5%) clearing GPT-5.5 (83.4%) or matching it. The model introduces two reasoning modes: max, which deepens a single chain of reasoning, and ultra, which coordinates subagents to work in parallel on complex tasks.
The more striking story may be efficiency. On ExploitGym, a cybersecurity vulnerability-research test, OpenAI says Sol matched the results of its earlier Mythos Preview while using only about one-third as many output tokens — a meaningful cost and latency win for agentic security work. Sol also posted 50.9% on Agent's Last Exam in code mode, which OpenAI describes as the only model past the halfway mark, and gains on GeneBench v1 for long-horizon genomics analysis.
OpenAI also published where Sol lands on the capability evaluations it treats as safety tripwires. On SecureBio's biology tests, Sol scored 68.4% on Human Pathogen Capabilities and 68.3% on World-Class Biology, with lower marks on Molecular Biology (60.0%) and the Virology Capabilities Test (53.5%). The company says results at this level are why it shipped GPT-5.6 with its most robust safeguards yet — and, on the cyber side, notes Sol found bugs and exploitation primitives in browsers like Chromium and Firefox but did not autonomously produce a full working exploit chain.
What makes this launch unusual is the gate around it. OpenAI says it limited the rollout at the request of the U.S. government, sharing the list of preview partners with officials, and that GPT-5.6 was built with its most robust safeguards yet — tuned to preserve legitimate work like code review, patch development and defensive testing while resisting misuse. The company was pointed about not wanting this to become routine, saying it does not believe a government-approved access process should be the long-term default. It plans general availability in ChatGPT, Codex and the API in the coming weeks, and has said Sol will run on Cerebras hardware targeting up to 750 tokens per second in July.
Want AI news before everyone else?
The morning's most important AI stories, straight to your inbox. No fluff.