Google launches Gemma 4 open AI model for devices

Google announced Gemma 4, a new family of open artificial intelligence models, expanding the scope of open models under the Apache 2.0 license.

There are four sizes in this family: Effective 2B, Effective 4B, 26B Mixture of Experts, and 31B Dense. It is designed to run on a variety of hardware, from mobile devices and laptops to developer workstations and accelerators.

Since the first generation was introduced, Gemma has been downloaded more than 400 million times, and developers have created more than 100,000 variants, according to Google. The company positions its latest release as advances in inference, coding, multimodal processing, and support for longer context windows.

Larger models can handle up to 256K of context, while smaller edge-focused models support 128K. All models can process images and video, but E2B and E4B models also include native speech input for speech recognition and understanding.

model range

According to Google, the 31B model currently ranks third among open models on the Arena AI text leaderboard, while the 26B model ranks sixth. Both are aimed at researchers and developers seeking more powerful inference performance on accessible hardware.

Google says the unquantized bfloat16 versions of the 26B and 31B models fit on a single 80GB Nvidia H100 GPU. The quantized version can also be run on consumer-grade GPUs for local uses such as coding assistants and automated workflows.

The 26B Mixture of Experts model is designed to activate a total of 3.8 billion parameters during inference and reduce latency. In contrast, the 31B Dense model is targeted at users looking for stronger output quality and a base for fine tuning.

The E2B and E4B models are on the smaller end of the spectrum and are designed for phones, IoT devices, and compact computing platforms. They were developed to limit memory usage and battery drain while running completely offline with low latency on devices such as smartphones, Raspberry Pi systems, and Nvidia Jetson Orin Nano units.

open license

The Apache 2.0 license is central to this release. This allows commercial use and modification with relatively few restrictions, potentially making the model more attractive to developers and organizations who want to maintain control over deployment and data processing.

Google says these models are intended to give developers the flexibility to deploy on-premises and cloud environments. It also says it goes through the same infrastructure security protocols as its own systems.

The announcement reflects growing competition in openweight AI models as companies seek to balance performance with reduced hardware requirements and ease of local deployment. There is growing interest in smaller models that can run on devices as developers seek lower costs, lower latency, and more control over privacy-friendly applications.

developer focus

Gemma 4 includes native support for function calls, structured JSON output and system instructions, capabilities for software agents to interact with tools and application programming interfaces. Google also says its models were trained in over 140 languages.

Launch support covers a wide range of development frameworks and runtimes, including Hugging Face tools, vLLM, llama.cpp, MLX, Ollama, and Nvidia software. Developers can also use models in Android workflows and adapt them to platforms such as Colab, Vertex AI, and local hardware.

Google also highlighted previous work that built on the earlier Gemma model, including a Bulgarian language model created by INSAIT called BgGPT and a collaboration with Yale University on Cell2Sentence-Scale for cancer research. These examples are intended to demonstrate that open models are useful for both local language applications and scientific use cases.

In a crowded market, Google is trying to differentiate Gemma 4 by offering a range that spans from mobile devices to high-end GPUs while keeping the model openly available. An emphasis on local deployment, multimodal input, and permissive licensing may be attractive to developers weighing open models versus closed commercial systems.

Google says Gemma 4 was built on the same research and technology foundations as Gemini 3.

Source link