Mistral AI has released Leanstral 1.5, an open-source model built for formal verification in the Lean 4 programming language. The model is available under the Apache 2.0 license and is designed to formally verify mathematical proofs and software correctness. Mistral claims the model performs exceptionally well in formal math benchmarks, achieving 100% on miniF2F, which includes problems from high school level up to math olympiad difficulty.
On PutnamBench, a benchmark with 672 problems from the Putnam math competition, Leanstral 1.5 solved 587 problems. It also achieved top results on the algebra benchmarks FATE-H and FATE-X, scoring 87% and 34% respectively. Mistral states that Leanstral 1.5 outperforms other open-source models on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover surpasses it on PutnamBench. In addition to its math capabilities, the model also performs well in code verification. A hands-on test showed it scanned 57 open-source repositories and identified five previously unknown bugs, including an overflow bug in the Rust library varinteger.
Training involved mid-training, supervised fine-tuning, and reinforcement learning. The model is available through Hugging Face and a free API. Source: thedecoder