Mistral OCR 4
by Mistral AI
Structure-aware document AI that returns bounding boxes, typed blocks, and per-word confidence scores.
Mistral OCR 4 is a document-understanding model from Mistral AI that extracts structured content (bounding boxes, typed-block labels, and per-word confidence scores) from PDFs and office documents across 170 languages, and can run fully self-hosted in a single container. It is built for enterprise teams building RAG, agentic, and enterprise-search pipelines that need citation-ready, verifiable document extraction.
At a Glance
- Category
- AI Models & APIs
- Pricing
- Usage-based, Contact for pricing
- Target Market
- CTOs, Enterprise Developers, Data Scientists, ML Engineers, AI Product Teams
- Founded
- 2023
- Headquarters
- Paris, France
Key Features
- ✓Structure-aware extraction
Returns bounding boxes, typed-block labels (titles, tables, equations, signatures), and structured markdown rather than plain text.
- ✓Per-word confidence scores
Provides inline confidence scores per page and per word to support human verification and auditable pipelines.
- ✓170-language support
Handles 170 languages across 10 language groups and PDF, DOC, PPT, and OpenDocument formats.
- ✓Single-container self-hosting
Runs in a single container for fully self-hosted, private deployment to meet data-residency and privacy requirements.
- ✓Benchmark-leading accuracy
Reports a top OlmOCRBench score of 85.20, 93.07 on OmniDocBench, and a 72% average human-preference win rate over competitors.
Capabilities
Use Cases
- •RAG document ingestion
Convert documents into citation-ready structured text for retrieval-augmented generation and enterprise search.
- •Agentic document workflows
Power invoice processing, form filling, and compliance checks with confidence-scored extraction.
- •Private, on-prem extraction
Deploy in a single container to process sensitive documents without leaving the enterprise.
Ideal For
Best For
- ✓Citation-ready document extraction for RAG pipelines
- ✓Self-hosted, private document AI for regulated data
- ✓Invoice, contract, and compliance document processing at scale
Integrations
Deployment
Market Analysis
Pros
- ✓Structure-aware, citation-ready output built for RAG and agents
- ✓Self-hosting option for regulated and private data
- ✓Strong reported benchmark and human-preference results
Cons
- ✗Benchmark and win-rate figures are largely vendor-reported
- ✗Focused on extraction rather than end-to-end workflow
Pricing
API
From $4 per 1,000 pages
- ✓Structured OCR output
- ✓50% Batch API discount ($2 per 1,000 pages)
Document AI
$5 per 1,000 pages
- ✓Structured outputs for custom and no-code pipelines
Self-hosted
Contact for pricing
- ✓Single-container private deployment for enterprise customers
API is $4 per 1,000 pages ($2 with the 50% Batch API discount); Document AI is $5 per 1,000 pages; self-hosted enterprise deployment is available on request.
Stay Ahead of the Curve
Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.
SubscribeRelated Products
Amazon Bedrock
The platform for building generative AI applications and agents at production scale.
Hugging Face
The AI community building the future.
OpenAI o3
Breakthrough reasoning model for complex math, science, and coding challenges
Anthropic Claude Sonnet 4.6
Optimal balance of intelligence, cost, and speed for production workloads