Model Release

Cohere Launches North Mini Code, 30B-Parameter Model for Coding Tasks

Cohere released North Mini Code, a 30B-parameter Mixture-of-Experts model with 3B active parameters, outperforming several open-source models on coding benchmarks.

Image: Hugging Face

Cohere today announced the release of North Mini Code, a 30B-parameter Mixture-of-Experts model with 3B active parameters, designed for agentic software engineering tasks. The model is available on Hugging Face under the Apache 2.0 license. North Mini Code is the first in Cohere’s new model family, optimized for complex software engineering workflows, terminal-based agentic tasks, and high-quality code generation. It achieves a score of 33.4 on the Artificial Analysis’ Coding Index, outperforming several leading open-source models, including Qwen3.5, Gemma 4, and Devstral Small 2.

The model is trained using multiple scaffolds rather than a single one, enabling it to serve as a reliable foundation for coding agents like OpenCode. North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model, using an efficient attention implementation with interleaved sliding-window and global attention in a 3:1 ratio. The feed-forward block is an MoE block with 128 experts, of which 8 are activated per token. Each expert block is an FFN block with SwiGLU activation, and the router applies a sigmoid activation function to the logits before the top-k selection.

Cohere trained North Mini Code using a two-stage cascaded supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR), focusing on agentic coding. The first stage SFT data includes programming, reasoning, and instruction following across a variety of domains, with code datasets making up 70% of trainable tokens. The second stage SFT uses a 4.5 billion token data mixture from agentic and reasoning-driven samples, where code data forms 61% of trainable tokens. The model also uses sample-level filtering to remove invalid tool calls, erroneous whitespace, and malformed special tokens.

Source: huggingface

Key points

Cohere released North Mini Code, a 30B-parameter Mixture-of-Experts model with 3B active parameters.
North Mini Code outperforms Qwen3.5, Gemma 4, Devstral Small 2, and even larger models like Nemotron 3 Super and Mistral Small 4 on coding benchmarks.
The model is trained using multiple scaffolds rather than optimizing for a single one, enabling it to serve as a reliable foundation for coding agents.
North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model with an efficient attention implementation using interleaved sliding-window and global attention in a 3:1 ratio.
Cohere trained North Mini Code using a two-stage cascaded supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR), focusing on agentic coding.
The first stage SFT data includes programming, reasoning, and instruction following across a variety of domains, with code datasets making up 70% of trainable tokens.

Source: Hugging Face Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.

Cohere Launches North Mini Code, 30B-Parameter Model for Coding Tasks

Key points

Related articles

Anthropic's Claude Opus 5 Costs Less Than Fable 5 While Matching Performance

Anthropic Releases Opus 5 Focused on Token Efficiency

Moonshot AI's Kimi K3 Sparks US-China AI Race

Kimi K3 Sparks AI Panic Amid U.S. Industry Reactions