Google Gemma 4: Open Models That Outperform Systems 20x Their Size
Google DeepMind released Gemma 4 on April 2, a family of four open-weight multimodal models under Apache 2.0 that bring frontier-level reasoning and agentic capabilities to phones, edge devices, and developer environments.
Google DeepMind released Gemma 4 on April 2, 2026, continuing its push to make frontier-grade AI accessible to developers without licensing restrictions. Built directly on research from the Gemini 3 project, Gemma 4 is described as "byte for byte, the most capable open models" Google has released — a claim backed by the 31B dense variant ranking third on Arena AI's open-source text leaderboard and by benchmark scores that dramatically outpace previous Gemma generations.
The family ships in four configurations: an Effective 2B (E2B) and Effective 4B (E4B) for mobile and IoT hardware, a 26B Mixture-of-Experts model optimized for low-latency edge inference, and a 31B dense model intended as a fine-tuning foundation for organizations that need maximum output quality. All four support at least 128K tokens of context, with the larger variants scaling to 256K. Every model in the family is natively multimodal — processing images, video, and audio alongside text — and supports over 140 languages.
The leap in coding capability is particularly striking. Gemma 4's Codeforces ELO jumped from 110 in Gemma 3 to 2150 in Gemma 4, a jump from beginner territory to expert competitive-programmer level in a single generation. On MMLU Pro, the 31B model scores 85.2%, and on AIME 2026 it reaches 89.2%. Google attributes the gains to tighter integration with Gemini 3's reasoning advances and improved instruction tuning across agentic workflows. The models include native function calling and structured JSON output, making them a natural fit for autonomous agent pipelines that need reliable tool use without cloud round-trips.
Deployment flexibility is a central design goal. The E2B and E4B variants are small enough to run fully offline on Raspberry Pi hardware and NVIDIA Jetson Orin Nano, while the larger models are available on Google AI Studio, Vertex AI, Cloud Run, and Hugging Face. Google has also integrated Gemma 4 into the AICore Developer Preview on Android, signaling intent to make on-device AI a first-class capability for mobile developers. All four models are available under the Apache 2.0 license, which permits unrestricted commercial use and redistribution.
The release continues a pattern of Google using Gemma as a pressure valve against proprietary API lock-in — giving developers a credible open alternative to fine-tune, self-host, and modify. With Gemma 4 bringing multimodality and 256K context to the open-weight ecosystem, the gap between what developers can do with open models versus closed APIs has narrowed further than at any previous point in the current AI cycle.