NVIDIA has introduced a new family of diffusion language models (DLMs) called Nemotron-Labs Diffusion, designed to improve text generation efficiency. These models generate multiple tokens in parallel and iteratively refine them, offering performance benefits over traditional autoregressive models. The Nemotron-Labs Diffusion family includes text models at 3B, 8B, and 14B scales, along with an 8B vision-language model (VLM), all available under NVIDIA licenses. The models support three generation modes: autoregressive, diffusion, and self-speculation. The diffusion mode achieves 2.6× higher tokens per forward pass (TPF) than autoregressive models, while self-speculation reaches 6.4× TPF with comparable accuracy. NVIDIA also released training code through the Megatron Bridge framework, enabling developers to train and fine-tune the models. Deployment of the models is supported in the main branch of SGLang, with inference available through a GitHub issue tracker. The models can be served in three different ways by adjusting a single line in the algorithm configuration, providing flexibility for developers. *Source: [huggingface](https://huggingface.co/blog/nvidia/nemotron-labs-diffusion)*