Mistral AI announced the release of Mistral OCR 4 on June 23, 2026, featuring bounding boxes, block classification, and inline confidence scores alongside extracted text. The model supports 170 languages across 10 language groups and is designed for self-hosted deployments. It serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines. OCR 4 is a small, focused model, with performance benchmarks and guidance on API versus Document AI usage provided in the release.
OCR 4 outperforms leading OCR and document-AI systems, with independent annotators preferring it in 72% of cases. It returns structured outputs including bounding boxes, typed-block classification, and inline confidence scores. These features enable source-grounded citations, redactions, and human-in-the-loop verification. The model is integrated with Mistral Search Toolkit, an open-source search framework, and supports common enterprise formats like PDF, DOC, PPT, and OpenDocument.
The release includes pricing details, with OCR 4 through the API priced at $4 per 1,000 pages and a 50% discount for batch processing, reducing the cost to $2 per 1,000 pages. Document AI is priced at $5 per 1,000 pages. The model also achieves a top score of 93.07 on OmniDocBench, though the benchmarks have known limitations in scoring certain outputs.
Source: mistral