This first release adds pretrained models for end-to-end OCR and document manipulation utilities.

Release handled by @fg-mindee & @charlesmindee

Note: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.

Highlights

Easy & high-performing document reading

Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.

PDF reading is a wrapper around PyMuPDF back-end for fast file reading

from doctr.documents import read_pdf
# from path
doc = read_pdf("path/to/your/doc.pdf")
# from stream
with open("path/to/your/doc.pdf", 'rb') as f:
    doc = read_pdf(f.read())

while image reading is using OpenCV backend

from doctr.documents import read_img
page = read_img("path/to/your/img.jpg")

Pretrained End-to-End OCR predictors

Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features

Text detection

Currently, only DBNet-based architectures are supported, more to come in the next releases!

from doctr.documents import read_pdf
from doctr.models import db_resnet50_predictor
model = db_resnet50_predictor(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model(doc)

Text recognition

There are two architectures implemented for recognition: CRNN, and SAR

from doctr.models import crnn_vgg16_bn_predictor
model = crnn_vgg16_bn_predictor(pretrained=True)

End-to-End OCR

Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])

New features

Documents

Documentation reading and manipulation

Added PDF (#8, #18, #25, #83) and image (#30, #79) reading utilities
Added document structured elements for export (#16, #26, #61, #102)

Models

Deep learning model building and inference

Added model export methods (#10)
Added preprocessing module (#20, #25, #36, #50, #55, #77)
Added text detection model and post-processing (#24, #32, #36, #43, #49, #51, #84): DBNet
Added image cropping function (#33, #44)
Added model param loading function (#49, #60)
Added text recognition post-processing (#35, #36, #37, #38, #43, #45, #49, #51, #63, #65, #74, #78, #84, #101, #107, #108, #111, #112): SAR & CRNN
Added task-specific predictors (#39, #52, #58, #62, #85, #98, #102)
Added VGG16 (#36), Resnet31 (#70) backbones

Utils

Utility features relevant to the library use cases.

Added page interactive prediction visualization (#54, #82)
Added custom types (#87)
Added abstract auto-repr object (#102)
Added metric module (#110)

Test

Verifications of the package well-being before release

Added pytest unittests (#7, #59, #75, #76, #80, #92, #104)

Documentation

Online resources for potential users

Updated README (#9, #48, #67, #68, #95)
Added CONTRIBUTING (#7, #29, #48, #67)
Added sphinx built documentation (#12, #36, #55, #86, #90, #91, #93, #96, #99, #106)

Others

Other tools and implementations

Added python package setup (#7, #21, #67)
Added CI verifications (#7, #67, #69, #73)
Added dockerized environment with library installed (#17, #19)
Added issue template (#34)
Added environment collection script (#81)
Added analysis script (#85, #95, #103)

mindee/doctr v0.1.0 v0.1.0: Pretrained models for seamless end-to-end OCR on GitHub