M

Mistral OCR 4

by Mistral AI

AI Models & APIsEnterprise Search & KnowledgeAutomation & WorkflowsData & Analytics

Structure-aware document AI that returns bounding boxes, typed blocks, and per-word confidence scores.

Usage-based · Contact for pricing·Added July 4, 2026·Updated July 4, 2026
Share:
THE DAILY BRIEF
Mistral OCR 4

by Mistral AI

AI Models & APIsEnterprise Search & KnowledgeAutomation & WorkflowsData & Analytics

Structure-aware document AI that returns bounding boxes, typed blocks, and per-word confidence scores.

Usage-based · Contact for pricing

Mistral OCR 4 is a document-understanding model from Mistral AI that extracts structured content (bounding boxes, typed-block labels, and per-word confidence scores) from PDFs and office documents across 170 languages, and can run fully self-hosted in a single container. It is built for enterprise teams building RAG, agentic, and enterprise-search pipelines that need citation-ready, verifiable document extraction.

At a Glance

Category
AI Models & APIs
Pricing
Usage-based, Contact for pricing
Target Market
CTOs, Enterprise Developers, Data Scientists, ML Engineers, AI Product Teams
Founded
2023
Headquarters
Paris, France

Key Features

  • Structure-aware extraction
  • Per-word confidence scores
  • 170-language support
  • Single-container self-hosting
  • Benchmark-leading accuracy

Capabilities

text generation
image generation
video generation
code generation
workflow automation
api access
audio generation
fine tuning
agent orchestration

Use Cases

  • RAG document ingestion
  • Agentic document workflows
  • Private, on-prem extraction

Ideal For

Best For

  • Citation-ready document extraction for RAG pipelines
  • Self-hosted, private document AI for regulated data
  • Invoice, contract, and compliance document processing at scale

Market Analysis

Enterprise-gradeState-of-the-art document AI

Pros

  • Structure-aware, citation-ready output built for RAG and agents
  • Self-hosting option for regulated and private data
  • Strong reported benchmark and human-preference results

Cons

  • Benchmark and win-rate figures are largely vendor-reported
  • Focused on extraction rather than end-to-end workflow

Pricing

API

From $4 per 1,000 pages

  • Structured OCR output
  • 50% Batch API discount ($2 per 1,000 pages)

Document AI

$5 per 1,000 pages

  • Structured outputs for custom and no-code pipelines

Self-hosted

Contact for pricing

  • Single-container private deployment for enterprise customers

API is $4 per 1,000 pages ($2 with the 50% Batch API discount); Document AI is $5 per 1,000 pages; self-hosted enterprise deployment is available on request.

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

beri.net

Subscribe at beri.net/subscribe for twice-weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Mistral OCR 4 is a document-understanding model from Mistral AI that extracts structured content (bounding boxes, typed-block labels, and per-word confidence scores) from PDFs and office documents across 170 languages, and can run fully self-hosted in a single container. It is built for enterprise teams building RAG, agentic, and enterprise-search pipelines that need citation-ready, verifiable document extraction.

At a Glance

Category
AI Models & APIs
Pricing
Usage-based, Contact for pricing
Target Market
CTOs, Enterprise Developers, Data Scientists, ML Engineers, AI Product Teams
Founded
2023
Headquarters
Paris, France

Key Features

  • Structure-aware extraction

    Returns bounding boxes, typed-block labels (titles, tables, equations, signatures), and structured markdown rather than plain text.

  • Per-word confidence scores

    Provides inline confidence scores per page and per word to support human verification and auditable pipelines.

  • 170-language support

    Handles 170 languages across 10 language groups and PDF, DOC, PPT, and OpenDocument formats.

  • Single-container self-hosting

    Runs in a single container for fully self-hosted, private deployment to meet data-residency and privacy requirements.

  • Benchmark-leading accuracy

    Reports a top OlmOCRBench score of 85.20, 93.07 on OmniDocBench, and a 72% average human-preference win rate over competitors.

Capabilities

text generation
image generation
video generation
code generation
workflow automation
api access
audio generation
fine tuning
agent orchestration

Use Cases

  • RAG document ingestion

    Convert documents into citation-ready structured text for retrieval-augmented generation and enterprise search.

  • Agentic document workflows

    Power invoice processing, form filling, and compliance checks with confidence-scored extraction.

  • Private, on-prem extraction

    Deploy in a single container to process sensitive documents without leaving the enterprise.

Ideal For

Best For

  • Citation-ready document extraction for RAG pipelines
  • Self-hosted, private document AI for regulated data
  • Invoice, contract, and compliance document processing at scale

Integrations

SDK Available
SDK:Python

Deployment

On-Premise

Market Analysis

Enterprise-gradeState-of-the-art document AI

Pros

  • Structure-aware, citation-ready output built for RAG and agents
  • Self-hosting option for regulated and private data
  • Strong reported benchmark and human-preference results

Cons

  • Benchmark and win-rate figures are largely vendor-reported
  • Focused on extraction rather than end-to-end workflow

Pricing

API

From $4 per 1,000 pages

  • Structured OCR output
  • 50% Batch API discount ($2 per 1,000 pages)

Document AI

$5 per 1,000 pages

  • Structured outputs for custom and no-code pipelines

Self-hosted

Contact for pricing

  • Single-container private deployment for enterprise customers

API is $4 per 1,000 pages ($2 with the 50% Batch API discount); Document AI is $5 per 1,000 pages; self-hosted enterprise deployment is available on request.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe