Model Release

Mistral's Leanstral 1.5 Excels in Formal Math and Code Verification

Mistral AI's open-source Leanstral 1.5 model achieved 100% on miniF2F and solved 587 of 672 problems on PutnamBench.

$Abstract digital visualization of AI, featuring colorful 3D elements and modern design.$

Photo: Google DeepMind / Pexels

Mistral AI has released Leanstral 1.5, an open-source model built for formal verification in the Lean 4 programming language. The model is available under the Apache 2.0 license and is designed to formally verify mathematical proofs and software correctness. Mistral claims the model performs exceptionally well in formal math benchmarks, achieving 100% on miniF2F, which includes problems from high school level up to math olympiad difficulty.

On PutnamBench, a benchmark with 672 problems from the Putnam math competition, Leanstral 1.5 solved 587 problems. It also achieved top results on the algebra benchmarks FATE-H and FATE-X, scoring 87% and 34% respectively. Mistral states that Leanstral 1.5 outperforms other open-source models on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover surpasses it on PutnamBench. In addition to its math capabilities, the model also performs well in code verification. A hands-on test showed it scanned 57 open-source repositories and identified five previously unknown bugs, including an overflow bug in the Rust library varinteger.

Training involved mid-training, supervised fine-tuning, and reinforcement learning. The model is available through Hugging Face and a free API. Source: thedecoder

Key points

Mistral AI's Leanstral 1.5 model achieved 100% on miniF2F, a formal math benchmark.
Leanstral 1.5 solved 587 of 672 problems on PutnamBench.
Leanstral 1.5 scored top results of 87% and 34% on FATE-H and FATE-X benchmarks.
Leanstral 1.5 outperforms other open-source models on PutnamBench, FATE-H, and FATE-X.
Leanstral 1.5 identified five previously unknown bugs in 57 open-source repositories.
The model is available through Hugging Face and a free API.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.