Announcing Chandra OCR 2 - a 4B parameter OCR model that scores 85.9% on the olmOCR benchmark (state of the art) and 77.8% on our internal 43-language multilingual benchmark. It is smaller and more accurate than Chandra 1 (9B) across every category.
Highlights
- 4B parameters (down from 9B), 2x throughput improvement
- 85.9% olmOCR overall (up from 83.1%)
- 77.8% multilingual average across 43 languages (up from 69.4%), 72.7% across 90 languages
- 2 pages/sec on H100 with 96 concurrent requests
- 15+ layout block types with bounding boxes
- Structured output for diagrams (Mermaid), charts, and images
olmOCR Benchmark
| Category | Chandra 1 | Chandra 2 | Change |
|---|---|---|---|
| ArXiv | 82.2% | 90.2% | +8.0 |
| Old Scans Math | 80.3% | 89.3% | +9.0 |
| Tables | 88.0% | 89.9% | +1.9 |
| Multi column | 81.2% | 83.5% | +2.3 |
| Overall | 83.1% | 85.9% | +2.8 |
Multilingual
43-language averages: Chandra 2 77.8%, Chandra 1 69.4%, Gemini 2.5 Flash 67.6%, GPT-5 Mini 60.5%.
90-language averages: Chandra 2 72.7%, Gemini 2.5 Flash 60.8%.
Largest improvements in South Asian scripts (Bengali +27.2, Kannada +42.6, Malayalam +46.2, Tamil +26.9, Telugu +39.1). Full 90-language results in FULL_BENCHMARKS.md.
New Capabilities
- Layout blocks:
text,section-header,caption,footnote,table,form,list-group,image,figure,diagram,equation-block,code-block,chemical-block,bibliography,table-of-contents,page-header,page-footer,complex-block - Mermaid diagrams: Flowcharts and process diagrams converted to Mermaid format
- Chart extraction: Structured data (values, categories, axis labels) from bar/line/pie charts
- Image captioning: Captions generated from visual content and surrounding context
- Chemistry detection: Molecular structure descriptions from chemical diagrams
Install
pip install chandra-ocr
# With vLLM (recommended)
chandra_vllm
chandra input.pdf ./output
# With HuggingFace
pip install chandra-ocr[hf]
chandra input.pdf ./output --method hf
Links
- GitHub: https://github.com/datalab-to/chandra
- HuggingFace: https://huggingface.co/datalab-to/chandra-ocr-2
- Blog post: https://www.datalab.to/blog/chandra-2