This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).

Release handled by @fg-mindee & @charlesmindee

Note: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.

Highlights

New pretrained weights

Enjoy our newly trained detection and recognition models with improved robustness and performances!
Check our fully benchmark in the documentation for further details.

Improved Line & block detection

This release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:

Before	After

File reading from any source

You can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new DocumentFile class methods

from doctr.documents import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
# Web page
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()

If by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside

from doctr.documents import DocumentFile
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Retrieve bounding box and text information
words = pdf_doc.get_words()

Reference scripts for training

By adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!

Text detection script (additional details available in README)

python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Text recognition script (additional details available in README)

python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Minimal API

If you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!

Run it as follows in a docker container:

PORT=8050 docker-compose up -d --build

Your API is now running locally on port 8050! Navigate to http://localhost:8050/redoc to check your documentation

Or start making your first request!

import requests
import io
with open('/path/to/your/image.jpeg', 'rb') as f:
    data = f.read()
response = requests.post("http://localhost:8050/recognition", files={'file': io.BytesIO(data)})

Breaking changes

Support dropped for TF < 2.4.0

In order to ensure that all compression features are fully functional in DocTR, support for TensorFlow < 2.4.0 has been dropped.

Less confusing predictor's inputs

OCRPredictor used to be taking a list of documents as input, and now only takes list of pages.

0.1.1	0.2.0
`>>> predictor = ...` `>>> page = np.zeros((h, w, 3), dtype=np.uint8)` `>>> out = predictor([[page]])`	`>>> predictor = ...` `>>> page = np.zeros((h, w, 3), dtype=np.uint8)` `>>> out = predictor([page])`

Model calls

To gain more flexibility on the training side, the model call method was changed to yield a dictionary with multiple entries

0.1.1	0.2.0
`>>> from doctr.models import db_resnet50, DBPostProcessor` `>>> model = db_resnet50(pretrained=True)` `>>> postprocessor = DBPostProcessor()` `>>> prob_map = model(input_t, training=False)` `>>> boxes = postprocessor(prob_map)`	`>>> from doctr.models import db_resnet50` `>>> model = db_resnet50(pretrained=True)` `>>> out = model(input_t, training=False)` `>>> boxes = out['boxes']`

New features

Datasets

Easy-to-use datasets for OCR

Added support of SROIE (#165) and CORD (#197)
Added recognition dataloader (#163)
Added sequence encoding function (#184)
Added DataLoader as a dataset wrapper for parallel high performance data reading (#198, #201)
Added support of OCRDataset (#244)

Documents

Added class methods for flexible file reading (#172)
Added visualization method to Document and Page (#174)
Added support for webpage conversion to document (#221, #222)
Added block detection in documents (#224)

Models

Deep learning model building and inference

Added pretrained weights for crnn_resnet31 recognition model (#160)
Added target building (#162) & loss computation (#171) methods in DBNet
Added loss computation for SAR (#185)
Added LinkNet detection architecture (#191, #200, #202)

Utils

Utility features relevant to the library use cases.

Added reset method to metric objects (#175)

Transforms

Data transformations operations

Added Compose, Resize, Normalize & LambdaTransformation (#205)
Added color transformations (#206, #207, #211)

Test

Verifications of the package well-being before release

Added unittest for preprocessors and DB target building (#162) & loss computation (#171)
Added unittests for dataloaders (#163, #198, #201)
Added unittests for file reading in all input forms and document format (#172, #221, #222, #240)
Added unittests for element display (#174)
Added unittests for metric reset (#175) & IoU computation (#176)
Added unittests for sequence encoding (#184)
Added unittests for loss computation of recognition models (#185, #186)
Added unittests for CORD datasets (#197)
Added unittests for transformations (#205, #206, #207)
Added unittests for OCRDataset (#244)

Documentation

Online resources for potential users

Added performances of crnn_resnet31 (#160)
Added instructions to use Docker in README (#174)
Added references to implemented papers in the README (#191)
Added DataLoader section (#198) as well as transforms (#205, #211) in the documentation
Added FPS in model benchmark section of the documentation (#209)
Added vocab descriptions in documentation (#238)
Added explanations to export models as SavedModel (#246)

References

Added reference training script for recognition (#164) & detection (#178)
Added checkpoint resuming (#210), test-only (#228), backbone freezing (#231), sample display (#249) options
Added data augmentations (#211)

Others

Other tools and implementations

Added localization and end-to-end visualization in demo app (#188)
Added minimal API implementation using FastAPI (#242, #245, #247)
Added CI workflow to run API unittests (#242)

Bug fixes

Datasets

Fixed inplace modifications of boxes in detection dataset (#212)
Fixed box dtype in detection datasets (#226)

Models

Fixed edge case of DBNet target computation (#163, #217)
Fixed edge case of resizing with zero-sized inputs (#177)
Fixed CTC loss computation in CRNN (#195)
Fixed NaN from a rare zero division by adding eps #228)
Fixed LinkNet loss computation (#250)

Utils

Fixed IoU computation with zero-sized inputs (#176, #177)
Fixed localization metric update (#227)

Documentation

Fixed usage instructions in README (#174)

References

Fixed resizing (#194) and validation transforms (#214) in recognition script
Fixed dataset args in training scripts (#195)
Fixed tensor scaling in recognition training (#241)
Fixed validation loop (#250)

Others

Fixed pypi publishing CI job (#159)
Fixed typo in evaluation script (#177)
Fixed demo app inference of detection model (#219)

Improvements

Datasets

Refactored dataloaders (#193)

Models

Refactored task predictors (#169)
Harmonized preprocessor call method (#162)
Switched input type of OCRPredictor from list of docs to list of pages (#170)
Added a backbone submodule with all corresponding models (#187) and refactored recognition models
Added improved pretrained weights of DBNet (#196)
Moved box score computation to core detection postprocessor (#203)
Refactored loss computation for detection models (#208)
Improved line detection in DocumentBuilder (#220)
Made detection postprocessing more robust to image size (#230)
Moved post processors inside the model to have a more flexible call (#248, #250)

Documents

Improved error for image reading when the file cannot be found (#229)
Increased default DPI for PDF to image rendering (#240)
Improved speed of PDF to image conversion (#251)

Utils

Made page display size dynamic while preserving aspect ratio (#173)
Improved visualization size resolution (#174)

Documentation

Added hyperlinks for license and CONTRIBUTING in the README (#169)
Enlarged column width in documentation (#169)
Added visualization script GIF in README (#173)
Revamped README and documentation (#182)
Rearranged model benchmark tables in documentation (#196, #199)
Improved documentation landing page (#239)

Tests

Added more thorough test cases for vision datasets (#165)
Refactored loader unittests (#193)
Added unittest for edge case in metric computation (#227)

References

Added preprocessing in training scripts (#180)
Added recognition loss computation (#189)
Added resize transformations (#205) in scripts
Added proper console metric logging (#208, #210)
Added dataset information console print (#228)

Others

Added version index override possibility for setup (#159)
Enabled TF gpu growth in demo & scripts (#179)
Added support of images and model selection in demo app (#183)
Improved PDF resizing in demo app & eval script (#237)
Dropped support for TF < 2.4 (#243)

mindee/doctr v0.2.0 v0.2.0: Increasingly more robust models and a fully functional API template on GitHub

Highlights

New pretrained weights

Improved Line & block detection

File reading from any source

Reference scripts for training

Minimal API

Breaking changes

Support dropped for TF < 2.4.0

Less confusing predictor's inputs

Model calls

New features

Datasets

Documents

Models

Utils

Transforms

Test

Documentation

References

Others

Bug fixes

Datasets

Models

Utils

Documentation

References

Others

Improvements

Datasets

Models

Documents

Utils

Documentation

Tests

References

Others

mindee/doctr v0.2.0
v0.2.0: Increasingly more robust models and a fully functional API template

on GitHub