github datalab-to/surya v0.20.0
Surya OCR 2

12 hours ago

Surya 2 (v0.20.0)

Surya 2 is a ground-up rework: a single 650M-param model now handles OCR, layout, and table recognition, served by vllm (NVIDIA GPU) or llama.cpp (CPU / Apple Silicon). Text detection and OCR-error detection remain separate lightweight torch models.

⚠️ This is a major release with breaking API and output-schema changes. See Upgrading from v1 below.

Highlights

  • State of the art for its size - 83.3% on olmOCR-bench, best in class under 3B params.
  • Multilingual - 87.2% average across a 91-language internal benchmark.
  • Fast - 5 pages/s throughput on RTX 5090.

Breaking changes — upgrading from v1

# v2
from surya.inference import SuryaInferenceManager
from surya.recognition import RecognitionPredictor

manager = SuryaInferenceManager()        # auto-spawns vllm or llama-server
rec = RecognitionPredictor(manager)
predictions = rec([image])
  • SuryaInferenceManager replaces FoundationPredictor, and is shared across LayoutPredictor, RecognitionPredictor, and TableRecPredictor.
  • Output schemas changed: text_linesblocks (each with html); layout dropped top_k and added count; table-rec cells dropped is_header / colspan / rowspan.
  • New runtime requirement: layout / OCR / table-rec need an inference backend - Docker + the NVIDIA Container Toolkit (GPU), or brew install llama.cpp (CPU / Apple Silicon). Detection still runs on torch alone.

Installation

pip install surya-ocr

Then make a backend available (see Breaking changes above). Full usage, output schemas, and tuning notes are in the README.

Notes

Full Changelog: v0.17.1...v0.20.0

Don't miss a new surya release

NewReleases is sending notifications on new releases.