Google Launches Gemma 4 Models and Shifts to Apache 2.0 License

Google introduced the Gemma 4 family of open-weight AI models, offering four variants optimized for local execution and mobile devices. The two larger models—26B Mixture of Experts and 31B Dense—run unquantized on a single 80GB Nvidia H100 GPU and can be quantized for consumer GPUs. Smaller Effective 2B and Effective 4B models target smartphones and edge hardware, benefitting from collaboration with Qualcomm and MediaTek. Google also replaced its custom Gemma license with the Apache 2.0 license, giving developers greater freedom. The company claims Gemma 4 models are the most capable locally runnable AI systems, positioning them near the top of open AI model rankings.

New Gemma 4 Models

Google announced the Gemma 4 series, expanding its portfolio of open-weight artificial intelligence models. The family includes four sizes designed for different deployment scenarios, from high‑performance servers to mobile and edge devices. By providing models that can run locally, Google aims to give developers more control over inference environments and reduce reliance on cloud services.

Hardware and Performance

The two larger variants—named 26B Mixture of Experts (MoE) and 31B Dense—are built to operate unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. While the H100 is a high‑end AI accelerator, Google notes that quantized versions of these models can run on consumer‑grade GPUs, widening accessibility. A key performance improvement is latency reduction. The 26B MoE model activates only 3.8 billion of its 26 billion parameters during inference, delivering higher tokens‑per‑second than similarly sized competitors. The 31B Dense model emphasizes quality and is expected to be fine‑tuned for specific applications.

Mobile‑Optimized Variants

Effective 2B (E2B) and Effective 4B (E4B) are the smaller Gemma 4 models aimed at mobile and edge devices. Google worked closely with Qualcomm and MediaTek to optimize these models for smartphones, Raspberry Pi boards, and Jetson Nano platforms. The designs keep memory usage low during inference and promise “near‑zero latency,” offering a more efficient alternative to the previous Gemma 3 models.

Licensing Change

Responding to developer feedback about licensing constraints, Google is discarding its custom Gemma license in favor of the Apache 2.0 license. This shift provides developers with broader freedom to use, modify, and distribute the models without the restrictions previously imposed by the proprietary license.

Competitive Position

Google asserts that the Gemma 4 models are the most capable AI systems that can be run on local hardware. It predicts that the 31B Dense variant will rank third on the Arena list of top open AI models, trailing only GLM‑5 and Kimi 2.5. Despite this high ranking, the Gemma 4 models remain a fraction of the size of the leading competitors, potentially lowering operational costs for users.