Sina has released VibeThinker-3B, a small language model with three billion parameters that performs on par with top models like DeepSeek V3.2 and Kimi K2.5 on math and coding benchmarks. The model's performance on structured tasks, such as math olympiads and programming challenges, is comparable to models with 200 to 333 times more parameters. According to the technical report, VibeThinker-3B solves 123 out of 128 problems on LeetCode contests, outperforming models like GPT-5.2 and Qwen3-Max. This result challenges the assumption that small models struggle with multi-step reasoning tasks.

The model builds on Alibaba's Qwen2.5-Coder-3B and relies on post-training techniques to achieve its performance. The process includes supervised fine-tuning, reinforcement learning for math, coding, and STEM tasks, and self-distillation to consolidate skills. A final instruction phase ensures the model follows user prompts accurately. The researchers argue that performance comes from training methods, data quality, and validation signals rather than parameter count alone. This approach suggests that logical reasoning can be compressed into a compact model, while factual knowledge still requires larger models.

The results support the 'Parametric Compression-Coverage Hypothesis,' which posits that different AI capabilities have distinct structural needs. Logical reasoning, such as solving math problems step by step, relies on a few recurring patterns and can be packed into a small model. In contrast, world knowledge requires broad coverage, which necessitates more parameters. VibeThinker-3B is openly available on Hugging Face and GitHub, highlighting the growing trend of small models catching up to larger systems on narrow tasks.

Source: thedecoder