Model Release

ByteDance's iLLaDA Diffusion Model Matches Qwen2.5

ByteDance's iLLaDA, an 8B diffusion language model, matches Qwen2.5 at the base level but lags behind after fine-tuning, according to new research.

Smartphone displaying AI app with book on AI technology in background.

Photo: Sanket Mishra / Pexels

ByteDance has introduced iLLaDA, an 8B diffusion language model that performs comparably to Qwen2.5 at the base level. The model, developed in collaboration with researchers from Renmin University, uses a diffusion approach that differs from traditional autoregressive models like ChatGPT. iLLaDA matches Qwen2.5 in general tasks but falls behind in fine-tuned scenarios, according to the research team.

The diffusion method used by iLLaDA involves starting with placeholders and refining them through multiple passes, similar to how image models generate images from noise. This allows every position in the sequence to attend to all others simultaneously, making the process bidirectional. iLLaDA's training data includes 12 trillion tokens, a significant increase from the 2.3 trillion used for its predecessor, LLaDA. The model's performance improved sharply, achieving 63.9 points on the reasoning test BBH, surpassing Qwen2.5's 63.3 points.

The research team notes that while iLLaDA shows promise, it still lags behind Qwen2.5 in certain areas, particularly in coding benchmarks. The authors attribute this to the absence of reinforcement learning alignment in iLLaDA, which is present in Qwen2.5. They also mention that the model can get stuck in reasoning loops on complex tasks. The study highlights the ongoing debate over whether diffusion models can match the performance of autoregressive models.

Source: thedecoder

Key points

ByteDance's iLLaDA is an 8B diffusion language model that matches Qwen2.5 at the base level.
iLLaDA improves sharply over its predecessor, LLaDA, by 21.6 points on the reasoning test BBH.
iLLaDA's performance on general tasks reaches 63.9 points, surpassing Qwen2.5's 63.3 points.
The model was pretrained on 12 trillion tokens, up from 2.3 trillion for its predecessor.
iLLaDA still lags behind Qwen2.5 in coding benchmarks due to the absence of reinforcement learning alignment.
The authors note that iLLaDA can get stuck in reasoning loops on harder tasks.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.