Cohere today announced the release of North Mini Code, a 30B-parameter Mixture-of-Experts model with 3B active parameters, designed for agentic software engineering tasks. The model is available on Hugging Face under the Apache 2.0 license. North Mini Code is the first in Cohere’s new model family, optimized for complex software engineering workflows, terminal-based agentic tasks, and high-quality code generation. It achieves a score of 33.4 on the Artificial Analysis’ Coding Index, outperforming several leading open-source models, including Qwen3.5, Gemma 4, and Devstral Small 2.

The model is trained using multiple scaffolds rather than a single one, enabling it to serve as a reliable foundation for coding agents like OpenCode. North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model, using an efficient attention implementation with interleaved sliding-window and global attention in a 3:1 ratio. The feed-forward block is an MoE block with 128 experts, of which 8 are activated per token. Each expert block is an FFN block with SwiGLU activation, and the router applies a sigmoid activation function to the logits before the top-k selection.

Cohere trained North Mini Code using a two-stage cascaded supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR), focusing on agentic coding. The first stage SFT data includes programming, reasoning, and instruction following across a variety of domains, with code datasets making up 70% of trainable tokens. The second stage SFT uses a 4.5 billion token data mixture from agentic and reasoning-driven samples, where code data forms 61% of trainable tokens. The model also uses sample-level filtering to remove invalid tool calls, erroneous whitespace, and malformed special tokens.

Source: huggingface