Models·4 min read·OpenAI

GPT-5.6 Is Here: OpenAI Ships Sol, Terra and Luna

OpenAI released GPT-5.6 on June 27 as a three-model family — the flagship Sol, the balanced Terra and the fast, cheap Luna — but only as a limited preview to about 20 U.S. government-approved partners. Sol leads agentic-coding benchmarks (91.9% on Terminal-Bench 2.1) and matches prior security results using roughly a third of the output tokens, with general availability promised in the coming weeks.

GPT-5.6 ARRIVES IN THREE TIERS OpenAI ships Sol, Terra and Luna — under U.S. government limits FLAGSHIP Sol Frontier coding & security $5 / $30 in / out per 1M tokens BALANCED Terra High-volume business tasks $2.50 / $15 in / out per 1M tokens EFFICIENT Luna Fast, low-cost everyday work $1 / $6 in / out per 1M tokens BITSMINDS.COM
Share:

OpenAI released GPT-5.6 on June 27, ending weeks of leaks and prediction-market bets — but in an unusually constrained way. Instead of a single flagship, the company shipped a three-model family: Sol, its most capable model, alongside the lower-cost Terra and the fast, cheap Luna. And rather than opening it to everyone, OpenAI is running a limited preview for roughly 20 partner organizations whose access was approved by the U.S. government on national-security grounds.

The three tiers are pitched at different jobs. Sol is built for the hardest work — long-horizon coding, security research and agentic tasks — at $5 per million input tokens and $30 per million output. Terra targets high-volume production work like customer support, internal tools and document analysis at roughly half the cost, $2.50 / $15, which OpenAI frames as about 2× cheaper than GPT-5.5. Luna, at $1 / $6, is for routine, latency-sensitive jobs such as summarizing, drafting and automation.

On benchmarks, Sol's headline result is agentic coding. In OpenAI's figures, Sol running in its new ultra mode scores 91.9% on Terminal-Bench 2.1 — ahead of plain Sol (88.8%) and Anthropic's Claude Mythos 5 (88.0%), with even the mid-tier Terra (84.3%) and budget Luna (82.5%) clearing GPT-5.5 (83.4%) or matching it. The model introduces two reasoning modes: max, which deepens a single chain of reasoning, and ultra, which coordinates subagents to work in parallel on complex tasks.

Terminal-Bench 2.1 — agentic coding (% of tasks solved) GPT-5.6 family in blue · competitors in gray · axis 0–100 GPT-5.6 Sol Ultra 91.9 GPT-5.6 Sol 88.8 Claude Mythos 5 88.0 GPT-5.6 Terra 84.3 Claude Fable 5 84.3 GPT-5.5 83.4 GPT-5.6 Luna 82.5 Claude Opus 4.8 78.9 Gemini 3.1 Pro Preview 70.7 Source: OpenAI (Terminal-Bench 2.1)
GPT-5.6 Sol Ultra leads all nine models on Terminal-Bench 2.1, a test of agentic command-line coding; the GPT-5.6 family (blue) brackets the field against Claude and Gemini (gray). Source: OpenAI.

The more striking story may be efficiency. On ExploitGym, a cybersecurity vulnerability-research test, OpenAI says Sol matched the results of its earlier Mythos Preview while using only about one-third as many output tokens — a meaningful cost and latency win for agentic security work. Sol also posted 50.9% on Agent's Last Exam in code mode, which OpenAI describes as the only model past the halfway mark, and gains on GeneBench v1 for long-horizon genomics analysis.

ExploitGym — output tokens to match (lower is better) ~33% GPT-5.6 Sol 100% Mythos Preview Sol reaches comparable results using roughly one-third the output tokens · Source: OpenAI
On ExploitGym, GPT-5.6 Sol reaches comparable security-research results using roughly a third of the output tokens of Mythos Preview — fewer tokens means lower cost and faster agents. (ExploitGym also exposes other views, such as bugs found.) Source: OpenAI.

OpenAI also published where Sol lands on the capability evaluations it treats as safety tripwires. On SecureBio's biology tests, Sol scored 68.4% on Human Pathogen Capabilities and 68.3% on World-Class Biology, with lower marks on Molecular Biology (60.0%) and the Virology Capabilities Test (53.5%). The company says results at this level are why it shipped GPT-5.6 with its most robust safeguards yet — and, on the cyber side, notes Sol found bugs and exploitation primitives in browsers like Chromium and Firefox but did not autonomously produce a full working exploit chain.

SecureBio biology evaluations — GPT-5.6 Sol (% score) 53.5 Virology 60.0 Molecular Bio 68.4 Human Pathogen 68.3 World-Class Bio Capability evals OpenAI tracks for safety · axis 0–100 · Source: OpenAI
GPT-5.6 Sol on SecureBio's biology capability evaluations — the kind of dual-use benchmark OpenAI uses to calibrate model safeguards. Source: OpenAI.

What makes this launch unusual is the gate around it. OpenAI says it limited the rollout at the request of the U.S. government, sharing the list of preview partners with officials, and that GPT-5.6 was built with its most robust safeguards yet — tuned to preserve legitimate work like code review, patch development and defensive testing while resisting misuse. The company was pointed about not wanting this to become routine, saying it does not believe a government-approved access process should be the long-term default. It plans general availability in ChatGPT, Codex and the API in the coming weeks, and has said Sol will run on Cerebras hardware targeting up to 750 tokens per second in July.

Want AI news before everyone else?

The morning's most important AI stories, straight to your inbox. No fluff.

Related Articles