Surya 2 (v0.20.0)

Surya 2 is a ground-up rework: a single 650M-param model now handles OCR, layout, and table recognition, served by vllm (NVIDIA GPU) or llama.cpp (CPU / Apple Silicon). Text detection and OCR-error detection remain separate lightweight torch models.

⚠️ This is a major release with breaking API and output-schema changes. See Upgrading from v1 below.

Highlights

State of the art for its size - 83.3% on olmOCR-bench, best in class under 3B params.
Multilingual - 87.2% average across a 91-language internal benchmark.
Fast - 5 pages/s throughput on RTX 5090.

Breaking changes — upgrading from v1

# v2
from surya.inference import SuryaInferenceManager
from surya.recognition import RecognitionPredictor

manager = SuryaInferenceManager()        # auto-spawns vllm or llama-server
rec = RecognitionPredictor(manager)
predictions = rec([image])

SuryaInferenceManager replaces FoundationPredictor, and is shared across LayoutPredictor, RecognitionPredictor, and TableRecPredictor.
Output schemas changed: text_lines → blocks (each with html); layout dropped top_k and added count; table-rec cells dropped is_header / colspan / rowspan.
New runtime requirement: layout / OCR / table-rec need an inference backend - Docker + the NVIDIA Container Toolkit (GPU), or brew install llama.cpp (CPU / Apple Silicon). Detection still runs on torch alone.

Installation

pip install surya-ocr

Then make a backend available (see Breaking changes above). Full usage, output schemas, and tuning notes are in the README.

Notes

Try it without installing anything on the Datalab playground.

Full Changelog: v0.17.1...v0.20.0

datalab-to/surya v0.20.0 Surya OCR 2 on GitHub

Surya 2 (v0.20.0)

Highlights

Breaking changes — upgrading from v1

Installation

Notes

datalab-to/surya v0.20.0
Surya OCR 2

on GitHub