Model Release

Google Releases DiffusionGemma, Open Model That Generates Text From Noise

Google has released DiffusionGemma, an open-source language model that generates text through diffusion, achieving speeds up to four times faster than traditional models on dedicated GPUs.

Abstract image featuring digital cubes with vibrant LED lighting effects, representing technology.

Photo: Pachon in Motion / Pexels

Google has released DiffusionGemma, an experimental open-source language model that generates text by starting with a block of 256 random tokens and refining them over several passes until readable text emerges. This approach, inspired by image AI diffusion models, allows the model to generate blocks of text simultaneously rather than word by word. According to Google, the model runs up to four times faster in single-user mode on dedicated GPUs compared to conventional language models. The speed gain is attributed to better utilization of GPU compute units, which remain busy during inference rather than waiting for data from memory.

DiffusionGemma processes up to 256 tokens in parallel, which shifts the bottleneck from memory bandwidth to raw compute power. Nvidia reports the model achieves about 700 tokens per second on a GeForce RTX 5090 and up to 800 tokens per second on the DGX Station. In Google's own benchmarks, DiffusionGemma runs about three and a half times faster than a same-size Gemma 4 model but scores slightly lower on accuracy tests. The model also has 26 billion parameters total, with only 3.8 billion activated per step thanks to a mixture-of-experts architecture.

Google positions DiffusionGemma as a tool for researchers and developers experimenting with fast, local workflows, while recommending traditional Gemma 4 models for tasks where output quality is critical. The model is available on Hugging Face under an Apache 2.0 license and works with common inference libraries like Hugging Face Transformers and vLLM. It also supports fine-tuning through tools like Hackable Diffusion, Unsloth, and Nvidia NeMo Framework. Nvidia has optimized the model for RTX 5090 and 4090 GPUs, as well as Hopper and Blackwell server architectures.

Source: thedecoder

Key points

Google released DiffusionGemma, an open-source language model that generates text using a diffusion-based method.
DiffusionGemma produces blocks of 256 tokens at once instead of generating text word by word.
The model runs up to four times faster in single-user mode on dedicated GPUs compared to traditional models.
Nvidia reports the model achieves about 700 tokens per second on a GeForce RTX 5090.
DiffusionGemma has 26 billion total parameters, with only 3.8 billion activated per step due to a mixture-of-experts architecture.
Google positions DiffusionGemma as a tool for developers and researchers working on fast, local workflows.
The model is available on Hugging Face under an Apache 2.0 license and works with common inference libraries.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.