Model Release

Sina's VibeThinker-3B Model Matches Top Models on Math and Coding

Sina's VibeThinker-3B, a 3-billion-parameter model, matches top models up to 333 times its size on math and coding benchmarks, according to a technical report.

$Smartphone displaying AI app with book on AI technology in background.$

Photo: Sanket Mishra / Pexels

Sina has released VibeThinker-3B, a small language model with three billion parameters that performs on par with top models like DeepSeek V3.2 and Kimi K2.5 on math and coding benchmarks. The model's performance on structured tasks, such as math olympiads and programming challenges, is comparable to models with 200 to 333 times more parameters. According to the technical report, VibeThinker-3B solves 123 out of 128 problems on LeetCode contests, outperforming models like GPT-5.2 and Qwen3-Max. This result challenges the assumption that small models struggle with multi-step reasoning tasks.

The model builds on Alibaba's Qwen2.5-Coder-3B and relies on post-training techniques to achieve its performance. The process includes supervised fine-tuning, reinforcement learning for math, coding, and STEM tasks, and self-distillation to consolidate skills. A final instruction phase ensures the model follows user prompts accurately. The researchers argue that performance comes from training methods, data quality, and validation signals rather than parameter count alone. This approach suggests that logical reasoning can be compressed into a compact model, while factual knowledge still requires larger models.

The results support the 'Parametric Compression-Coverage Hypothesis,' which posits that different AI capabilities have distinct structural needs. Logical reasoning, such as solving math problems step by step, relies on a few recurring patterns and can be packed into a small model. In contrast, world knowledge requires broad coverage, which necessitates more parameters. VibeThinker-3B is openly available on Hugging Face and GitHub, highlighting the growing trend of small models catching up to larger systems on narrow tasks.

Source: thedecoder

Key points

Sina's VibeThinker-3B, a 3-billion-parameter model, matches top models up to 333 times its size on math and coding benchmarks.
VibeThinker-3B solves 123 out of 128 problems on LeetCode contests, outperforming models like GPT-5.2 and Qwen3-Max.
The model builds on Alibaba's Qwen2.5-Coder-3B and uses post-training techniques to achieve its performance.
VibeThinker-3B performs on par with DeepSeek V3.2 and Kimi K2.5 on competitive benchmarks like AIME26.
The researchers propose the 'Parametric Compression-Coverage Hypothesis,' suggesting different AI capabilities have distinct structural needs.
Logical reasoning can be compressed into a compact model, while factual knowledge requires broader coverage.
VibeThinker-3B is openly available on Hugging Face and GitHub, reflecting a trend of small models catching up to larger systems on narrow tasks.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.