This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).
Brought to you by @fg-mindee & @charlesmindee
Note: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
Improved pretrained parameters for your favorite models 🚀
Which each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:
- improved params for
crnn_vgg16_bn&sar_resnet31 - evaluation results on a new private dataset (US tax forms)
Lighter backbones for faster architectures ⚡
Without any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of MobileNet V3 and pretrained it for character classification for both PyTorch & TensorFlow.
Speeding up preprocessors & datasets 🚆
Whether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading/processing by leveraging multi-threading!
Better demo app 🎨
We value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:
Page selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback 🤗
[beta] Character classification
As DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible.
So this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data 😯
Breaking changes
Default dtype of TF datasets
In order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:
| 0.3.0 | 0.3.1 |
|---|---|
>>> from doctr.datasets import FUNSD >>> ds = FUNSD() >>> img, target = ds[0] >>> print(img.dtype) <dtype: 'uint8'> >>> print(img.numpy().min(), img.numpy().max()) 0 255
| >>> from doctr.datasets import FUNSD >>> ds = FUNSD() >>> img, target = ds[0] >>> print(img.dtype) <dtype: 'float32'> >>> print(img.numpy().min(), img.numpy().max()) 0.0 1.0
|
I/O module
Whether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the doctr.documents module was repurposed into doctr.io.
| 0.3.0 | 0.3.1 |
|---|---|
>>> from doctr.documents import DocumentFile >>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
| >>> from doctr.io import DocumentFile >>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
|
It now also includes an image submodule for easy tensor <--> numpy conversion for all supported data types.
Multithreading relocated
As multithreading is getting increasingly used to boost performances in the entire library, it has been moved from utilities of TF-only datasets to doctr.utils.multithreading:
| 0.3.0 | 0.3.1 |
|---|---|
>>> from doctr.datasets.multithreading import multithread_exec >>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])
| >>> from doctr.utils.multithreading import multithread_exec >>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])
|
New features
Datasets
Resources to access data in efficient ways
- Added support of FP16 (#367)
- Added option to merge subsets for recognition datasets (#376)
- Added dynamic sequence encoding (#393)
- Added support of new label format datasets (#407)
- Added character generator dataset for image classification (#412, #418)
IO
Features to manipulate input & outputs
- Added
Elementcreation from dictionary (#386) - Added byte decoding function for PyTorch and TF (#390)
- Added extra tensor conversion functions (#412)
Models
Deep learning model building and inference
- Added
crnn_resnet31as a recognition model (#361) - Added a uniform comprehensive preprocessing mechanism for both frameworks (#370)
- Added support of FP16 (#382)
- Added MobileNet V3 for TensorFlow as a backbone (#372, #410, #420)
- Added superior pretrained params for
crnn_vgg16_bnin TF (#395) - Added pretrained params for
masterin TF (#396) - Added mobilenet backbone availability to detection & recognition models (#398, #399)
- Added pretrained params for mobilenets on character classification (#415, #421, #424)
- Added superior pretrained params for
sar_resnet31in TF (#395)
Utils
Utility features relevant to the library use cases.
Transforms
Data transformations operations
- Added
rotatefunction (#358) and its corresponding augmentation module (#363) - Added cropping function (#366)
- Added support of FP16 (#388)
Test
Verifications of the package well-being before release
- Added unittests for rotation functions (#358)
- Added test cases for recognition zoo (#361)
- Added unittests for
RandomRotate(#363) - Added test cases for recognition dataset merging (#376)
- Added unittests for Element creation from dicts (#386)
- Added test case for mobilenet backbone (#372)
- Added unittests for datasets with new format (#407)
- Added test cases for the character generator (#412)
Documentation
Online resources for potential users
- Added entry for
RandomRotate(#363) - Added entry for
CharacterGenerator(#412) - Added evaluation on US tax forms in the documentation (#419)
References
Reference training scripts
- Added PyTorch training reference scripts (#359, #394)
- Added LR scheduler to TensorFlow script (#360, #374, #397) & Pytorch scripts (#381)
- Added possibility to use multi-folder datasets (#377)
- Added character classification training script (#414, #420)
Others
Other tools and implementations
- Added page selection and result JSON display in demo (#369)
- Added an entry for MASTER in model selection of the demo (#400)
Bug fixes
Datasets
- Fixed image shape resolution in custom datasets (#354)
- Fixed box clipping to avoid rounding errors in datasets (#355)
Models
- Fixed GPU compatibility of detection models (#359) & recognition models (#361)
- Fixed recognition model loss computation in PyTorch (#379)
- Fixed loss computation of CRNN in PyTorch (#434)
- Fixed loss computation of MASTER in PyTorch (#440)
Transforms
- Fixed Resize transformation when aspect ratio matches the target (#357)
- Fixed box rotation (#378)
- Fixed image expansion while rotating (#438)
Documentation
- Fixed installation instructions (#437)
References
- Fixed missing import in utils (#389)
- Fixed GPU support for PyTorch in the recognition script (#427)
- Fixed gradient clipping in Pytorch scripts (#432)
Others
- Fixed trigger of script testing CI job (#351)
- Constrained
PILversion due to issues with version 8.3 (#362) - Added missing mypy config ignore (#365)
- Fixed PDF page rendering for demo & analysis script (#368)
- Constrained
weasyprintversion due to issues with version 53.0 (#404) - Constrained
matplotlibversion due to issues with version 3.4.3 (#413)
Improvements
Datasets
- Improved typing of
doctr.datasets(#354) - Improved PyTorch data loading (#362)
- Switched default dtype of TF to
tf.float32instead oftf.uint8(#367, #375) - Optimized sequence encoding throught multithreading (#393)
IO
- Relocated
doctr.documentstodoctr.io(#390)
Models
- Updated bottleneck channel multiplier in MASTER (#350)
- Added dropout in MASTER (#349)
- Renamed MASTER attribute for consistency across models (#356)
- Added target validation for detection models (#355)
- Optimized preprocessing by leveraging multithreading (#370)
- Added dtype dynamic resolution and specified error messages (#382)
- Harmonized parameter loading in PyTorch (#425)
- Enabled backbone pretraining in complex architectures (#435)
- Made head & FPN sizing dynamic using the feature extractor in detection models (#435)
Utils
- Moved multithreading to
doctr.utils(#371) - Improved format validation for visualization features (#392)
- Moved
doctr.models._utils.rotate_pagetodoctr.utils.geometry.rotate_image(#371)
Documentation
- Updated pypi badge & documentation changelog (#346)
- Added export example in documentation (#348)
- Reflected relocation of
doctr.documentstodoctr.ioin documentation and README (#390) - Updated recognition benchmark (#395, #441)
- Updated model entries (#435)
- Updated authors & maintainer references in
setup.pyand in README (#444)
Tests
- Added test case for same aspect ratio resizing (#357)
- Extended testing of datasets (#354)
- Added test cases for FP16 support of datasets (#367), models (#382), transforms (#388)
- Moved multithreading unittests (#371)
- Extended test cases of preprocessors (#370)
- Reflected relocation of
doctr.documentstodoctr.io(#390) - Removed unused imports (#391)
- Extended test cases for visualization (#392)
- Updated unittests of sequence encoding (#393)
- Added extra test cases for rotation validation (#438)
References
- Added optimal selection of workers for all scripts (#362)
- Reflected changes of switching to
tf.float32by default for datasets (#367) - Removed legacy script arg (#380)
- Removed unused imports (#391)
- Improved device selection for Pytorch training script (#427)
- Improved metric logging when the value is undefined (#432)
Others
- Updated package version (#346)
