Surya 2 (v0.20.0)
Surya 2 is a ground-up rework: a single 650M-param model now handles OCR, layout, and table recognition, served by vllm (NVIDIA GPU) or llama.cpp (CPU / Apple Silicon). Text detection and OCR-error detection remain separate lightweight torch models.
⚠️ This is a major release with breaking API and output-schema changes. See Upgrading from v1 below.
Highlights
- State of the art for its size - 83.3% on olmOCR-bench, best in class under 3B params.
- Multilingual - 87.2% average across a 91-language internal benchmark.
- Fast - 5 pages/s throughput on RTX 5090.
Breaking changes — upgrading from v1
# v2
from surya.inference import SuryaInferenceManager
from surya.recognition import RecognitionPredictor
manager = SuryaInferenceManager() # auto-spawns vllm or llama-server
rec = RecognitionPredictor(manager)
predictions = rec([image])SuryaInferenceManagerreplacesFoundationPredictor, and is shared acrossLayoutPredictor,RecognitionPredictor, andTableRecPredictor.- Output schemas changed:
text_lines→blocks(each withhtml); layout droppedtop_kand addedcount; table-rec cells droppedis_header/colspan/rowspan. - New runtime requirement: layout / OCR / table-rec need an inference backend - Docker + the NVIDIA Container Toolkit (GPU), or
brew install llama.cpp(CPU / Apple Silicon). Detection still runs on torch alone.
Installation
pip install surya-ocrThen make a backend available (see Breaking changes above). Full usage, output schemas, and tuning notes are in the README.
Notes
- Try it without installing anything on the Datalab playground.
Full Changelog: v0.17.1...v0.20.0