Mistral AI has released OCR 4, a new model that reads text from documents such as PDFs, Word files, and PowerPoint presentations. Unlike earlier versions, OCR 4 does not just extract raw text but also identifies where each element sits on the page and its role, such as title, table, equation, or signature. This block classification helps break documents into meaningful sections automatically, which is useful for feeding them into search systems or letting AI agents process them. The model also outputs confidence scores, providing an estimate of how certain it is about each word or page it reads. According to Mistral, the model claims to beat all tested competitors across both benchmarks. OCR 4 supports 170 languages and works well even with less common ones. In a blind test with over 600 documents, independent reviewers preferred OCR 4's results 72 percent of the time over competing models, the company says. The model is available through the API, Mistral Studio, and Microsoft Foundry. It costs $4 per 1,000 pages, or $2 in batch mode.

Mistral's OCR 4 model is designed to handle a wide range of document types and languages, making it a versatile tool for businesses and developers. The model's ability to classify document elements and provide confidence scores enhances its usability in various applications. Mistral AI has positioned OCR 4 as a significant improvement over its predecessors, emphasizing its performance in real-world scenarios. The company's claim of outperforming competitors in 72% of blind test cases highlights its competitive edge in the OCR market.

Source: thedecoder