This first release adds pretrained models for end-to-end OCR and document manipulation utilities.
Release handled by @fg-mindee & @charlesmindee
Note: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.
Highlights
Easy & high-performing document reading
Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.
PDF reading is a wrapper around PyMuPDF back-end for fast file reading
from doctr.documents import read_pdf
# from path
doc = read_pdf("path/to/your/doc.pdf")
# from stream
with open("path/to/your/doc.pdf", 'rb') as f:
doc = read_pdf(f.read())
while image reading is using OpenCV backend
from doctr.documents import read_img
page = read_img("path/to/your/img.jpg")
Pretrained End-to-End OCR predictors
Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features
Text detection
Currently, only DBNet-based architectures are supported, more to come in the next releases!
from doctr.documents import read_pdf
from doctr.models import db_resnet50_predictor
model = db_resnet50_predictor(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model(doc)
Text recognition
There are two architectures implemented for recognition: CRNN, and SAR
from doctr.models import crnn_vgg16_bn_predictor
model = crnn_vgg16_bn_predictor(pretrained=True)
End-to-End OCR
Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document
from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn
model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
New features
Documents
Documentation reading and manipulation
- Added PDF (#8, #18, #25, #83) and image (#30, #79) reading utilities
- Added document structured elements for export (#16, #26, #61, #102)
Models
Deep learning model building and inference
- Added model export methods (#10)
- Added preprocessing module (#20, #25, #36, #50, #55, #77)
- Added text detection model and post-processing (#24, #32, #36, #43, #49, #51, #84): DBNet
- Added image cropping function (#33, #44)
- Added model param loading function (#49, #60)
- Added text recognition post-processing (#35, #36, #37, #38, #43, #45, #49, #51, #63, #65, #74, #78, #84, #101, #107, #108, #111, #112): SAR & CRNN
- Added task-specific predictors (#39, #52, #58, #62, #85, #98, #102)
- Added VGG16 (#36), Resnet31 (#70) backbones
Utils
Utility features relevant to the library use cases.
- Added page interactive prediction visualization (#54, #82)
- Added custom types (#87)
- Added abstract auto-repr object (#102)
- Added metric module (#110)
Test
Verifications of the package well-being before release
Documentation
Online resources for potential users
- Updated README (#9, #48, #67, #68, #95)
- Added CONTRIBUTING (#7, #29, #48, #67)
- Added sphinx built documentation (#12, #36, #55, #86, #90, #91, #93, #96, #99, #106)
Others
Other tools and implementations