This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).
Release handled by @fg-mindee & @charlesmindee
Note: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.
Highlights
New pretrained weights
Enjoy our newly trained detection and recognition models with improved robustness and performances!
Check our fully benchmark in the documentation for further details.
Improved Line & block detection
This release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:
| Before | After |
|---|---|
|
|
File reading from any source
You can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new DocumentFile class methods
from doctr.documents import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
# Web page
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()If by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside
from doctr.documents import DocumentFile
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Retrieve bounding box and text information
words = pdf_doc.get_words()Reference scripts for training
By adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!
Text detection script (additional details available in README)
python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20Text recognition script (additional details available in README)
python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20Minimal API
If you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!
Run it as follows in a docker container:
PORT=8050 docker-compose up -d --buildYour API is now running locally on port 8050! Navigate to http://localhost:8050/redoc to check your documentation
Or start making your first request!
import requests
import io
with open('/path/to/your/image.jpeg', 'rb') as f:
data = f.read()
response = requests.post("http://localhost:8050/recognition", files={'file': io.BytesIO(data)})Breaking changes
Support dropped for TF < 2.4.0
In order to ensure that all compression features are fully functional in DocTR, support for TensorFlow < 2.4.0 has been dropped.
Less confusing predictor's inputs
OCRPredictor used to be taking a list of documents as input, and now only takes list of pages.
| 0.1.1 | 0.2.0 |
|---|---|
>>> predictor = ... >>> page = np.zeros((h, w, 3), dtype=np.uint8) >>> out = predictor([[page]])
| >>> predictor = ...>>> page = np.zeros((h, w, 3), dtype=np.uint8)>>> out = predictor([page])
|
Model calls
To gain more flexibility on the training side, the model call method was changed to yield a dictionary with multiple entries
| 0.1.1 | 0.2.0 |
|---|---|
>>> from doctr.models import db_resnet50, DBPostProcessor >>> model = db_resnet50(pretrained=True) >>> postprocessor = DBPostProcessor() >>> prob_map = model(input_t, training=False) >>> boxes = postprocessor(prob_map)
| >>> from doctr.models import db_resnet50 >>> model = db_resnet50(pretrained=True)>>> out = model(input_t, training=False)>>> boxes = out['boxes']
|
New features
Datasets
Easy-to-use datasets for OCR
- Added support of SROIE (#165) and CORD (#197)
- Added recognition dataloader (#163)
- Added sequence encoding function (#184)
- Added
DataLoaderas a dataset wrapper for parallel high performance data reading (#198, #201) - Added support of
OCRDataset(#244)
Documents
- Added class methods for flexible file reading (#172)
- Added visualization method to Document and Page (#174)
- Added support for webpage conversion to document (#221, #222)
- Added block detection in documents (#224)
Models
Deep learning model building and inference
- Added pretrained weights for
crnn_resnet31recognition model (#160) - Added target building (#162) & loss computation (#171) methods in DBNet
- Added loss computation for SAR (#185)
- Added LinkNet detection architecture (#191, #200, #202)
Utils
Utility features relevant to the library use cases.
- Added reset method to metric objects (#175)
Transforms
Data transformations operations
- Added
Compose,Resize,Normalize&LambdaTransformation(#205) - Added color transformations (#206, #207, #211)
Test
Verifications of the package well-being before release
- Added unittest for preprocessors and DB target building (#162) & loss computation (#171)
- Added unittests for dataloaders (#163, #198, #201)
- Added unittests for file reading in all input forms and document format (#172, #221, #222, #240)
- Added unittests for element display (#174)
- Added unittests for metric reset (#175) & IoU computation (#176)
- Added unittests for sequence encoding (#184)
- Added unittests for loss computation of recognition models (#185, #186)
- Added unittests for CORD datasets (#197)
- Added unittests for transformations (#205, #206, #207)
- Added unittests for
OCRDataset(#244)
Documentation
Online resources for potential users
- Added performances of crnn_resnet31 (#160)
- Added instructions to use Docker in README (#174)
- Added references to implemented papers in the README (#191)
- Added DataLoader section (#198) as well as transforms (#205, #211) in the documentation
- Added FPS in model benchmark section of the documentation (#209)
- Added vocab descriptions in documentation (#238)
- Added explanations to export models as SavedModel (#246)
References
- Added reference training script for recognition (#164) & detection (#178)
- Added checkpoint resuming (#210), test-only (#228), backbone freezing (#231), sample display (#249) options
- Added data augmentations (#211)
Others
Other tools and implementations
- Added localization and end-to-end visualization in demo app (#188)
- Added minimal API implementation using FastAPI (#242, #245, #247)
- Added CI workflow to run API unittests (#242)
Bug fixes
Datasets
- Fixed inplace modifications of boxes in detection dataset (#212)
- Fixed box dtype in detection datasets (#226)
Models
- Fixed edge case of DBNet target computation (#163, #217)
- Fixed edge case of resizing with zero-sized inputs (#177)
- Fixed CTC loss computation in CRNN (#195)
- Fixed NaN from a rare zero division by adding eps #228)
- Fixed LinkNet loss computation (#250)
Utils
Documentation
- Fixed usage instructions in README (#174)
References
- Fixed resizing (#194) and validation transforms (#214) in recognition script
- Fixed dataset args in training scripts (#195)
- Fixed tensor scaling in recognition training (#241)
- Fixed validation loop (#250)
Others
- Fixed pypi publishing CI job (#159)
- Fixed typo in evaluation script (#177)
- Fixed demo app inference of detection model (#219)
Improvements
Datasets
- Refactored dataloaders (#193)
Models
- Refactored task predictors (#169)
- Harmonized preprocessor call method (#162)
- Switched input type of OCRPredictor from list of docs to list of pages (#170)
- Added a backbone submodule with all corresponding models (#187) and refactored recognition models
- Added improved pretrained weights of DBNet (#196)
- Moved box score computation to core detection postprocessor (#203)
- Refactored loss computation for detection models (#208)
- Improved line detection in DocumentBuilder (#220)
- Made detection postprocessing more robust to image size (#230)
- Moved post processors inside the model to have a more flexible call (#248, #250)
Documents
- Improved error for image reading when the file cannot be found (#229)
- Increased default DPI for PDF to image rendering (#240)
- Improved speed of PDF to image conversion (#251)
Utils
- Made page display size dynamic while preserving aspect ratio (#173)
- Improved visualization size resolution (#174)
Documentation
- Added hyperlinks for license and CONTRIBUTING in the README (#169)
- Enlarged column width in documentation (#169)
- Added visualization script GIF in README (#173)
- Revamped README and documentation (#182)
- Rearranged model benchmark tables in documentation (#196, #199)
- Improved documentation landing page (#239)
Tests
- Added more thorough test cases for vision datasets (#165)
- Refactored loader unittests (#193)
- Added unittest for edge case in metric computation (#227)
References
- Added preprocessing in training scripts (#180)
- Added recognition loss computation (#189)
- Added resize transformations (#205) in scripts
- Added proper console metric logging (#208, #210)
- Added dataset information console print (#228)

