This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).

Brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improved pretrained parameters for your favorite models 🚀

Which each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:

improved params for crnn_vgg16_bn & sar_resnet31
evaluation results on a new private dataset (US tax forms)

Lighter backbones for faster architectures ⚡

Without any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of MobileNet V3 and pretrained it for character classification for both PyTorch & TensorFlow.

Speeding up preprocessors & datasets 🚆

Whether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading/processing by leveraging multi-threading!

Better demo app 🎨

We value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:

Page selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback 🤗

[beta] Character classification

As DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible.

So this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data 😯

Breaking changes

Default dtype of TF datasets

In order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:

0.3.0	0.3.1
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD()` `>>> img, target = ds[0]` `>>> print(img.dtype)` `<dtype: 'uint8'>` `>>> print(img.numpy().min(), img.numpy().max())` `0 255`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD()` `>>> img, target = ds[0]` `>>> print(img.dtype)` `<dtype: 'float32'>` `>>> print(img.numpy().min(), img.numpy().max())` `0.0 1.0`

I/O module

Whether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the doctr.documents module was repurposed into doctr.io.

0.3.0	0.3.1
`>>> from doctr.documents import DocumentFile` `>>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()`	`>>> from doctr.io import DocumentFile` `>>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()`

It now also includes an image submodule for easy tensor <--> numpy conversion for all supported data types.

Multithreading relocated

As multithreading is getting increasingly used to boost performances in the entire library, it has been moved from utilities of TF-only datasets to doctr.utils.multithreading:

0.3.0	0.3.1
`>>> from doctr.datasets.multithreading import multithread_exec` `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])`	`>>> from doctr.utils.multithreading import multithread_exec` `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])`

New features

Datasets

Resources to access data in efficient ways

Added support of FP16 (#367)
Added option to merge subsets for recognition datasets (#376)
Added dynamic sequence encoding (#393)
Added support of new label format datasets (#407)
Added character generator dataset for image classification (#412, #418)

IO

Features to manipulate input & outputs

Added Element creation from dictionary (#386)
Added byte decoding function for PyTorch and TF (#390)
Added extra tensor conversion functions (#412)

Models

Deep learning model building and inference

Added crnn_resnet31 as a recognition model (#361)
Added a uniform comprehensive preprocessing mechanism for both frameworks (#370)
Added support of FP16 (#382)
Added MobileNet V3 for TensorFlow as a backbone (#372, #410, #420)
Added superior pretrained params for crnn_vgg16_bn in TF (#395)
Added pretrained params for master in TF (#396)
Added mobilenet backbone availability to detection & recognition models (#398, #399)
Added pretrained params for mobilenets on character classification (#415, #421, #424)
Added superior pretrained params for sar_resnet31 in TF (#395)

Utils

Utility features relevant to the library use cases.

Added box rotation function (#358)
Added box visualization feature (#384)

Transforms

Data transformations operations

Added rotate function (#358) and its corresponding augmentation module (#363)
Added cropping function (#366)
Added support of FP16 (#388)

Test

Verifications of the package well-being before release

Added unittests for rotation functions (#358)
Added test cases for recognition zoo (#361)
Added unittests for RandomRotate (#363)
Added test cases for recognition dataset merging (#376)
Added unittests for Element creation from dicts (#386)
Added test case for mobilenet backbone (#372)
Added unittests for datasets with new format (#407)
Added test cases for the character generator (#412)

Documentation

Online resources for potential users

Added entry for RandomRotate (#363)
Added entry for CharacterGenerator (#412)
Added evaluation on US tax forms in the documentation (#419)

References

Reference training scripts

Added PyTorch training reference scripts (#359, #394)
Added LR scheduler to TensorFlow script (#360, #374, #397) & Pytorch scripts (#381)
Added possibility to use multi-folder datasets (#377)
Added character classification training script (#414, #420)

Others

Other tools and implementations

Added page selection and result JSON display in demo (#369)
Added an entry for MASTER in model selection of the demo (#400)

Bug fixes

Datasets

Fixed image shape resolution in custom datasets (#354)
Fixed box clipping to avoid rounding errors in datasets (#355)

Models

Fixed GPU compatibility of detection models (#359) & recognition models (#361)
Fixed recognition model loss computation in PyTorch (#379)
Fixed loss computation of CRNN in PyTorch (#434)
Fixed loss computation of MASTER in PyTorch (#440)

Transforms

Fixed Resize transformation when aspect ratio matches the target (#357)
Fixed box rotation (#378)
Fixed image expansion while rotating (#438)

Documentation

Fixed installation instructions (#437)

References

Fixed missing import in utils (#389)
Fixed GPU support for PyTorch in the recognition script (#427)
Fixed gradient clipping in Pytorch scripts (#432)

Others

Fixed trigger of script testing CI job (#351)
Constrained PIL version due to issues with version 8.3 (#362)
Added missing mypy config ignore (#365)
Fixed PDF page rendering for demo & analysis script (#368)
Constrained weasyprint version due to issues with version 53.0 (#404)
Constrained matplotlib version due to issues with version 3.4.3 (#413)

Improvements

Datasets

Improved typing of doctr.datasets (#354)
Improved PyTorch data loading (#362)
Switched default dtype of TF to tf.float32 instead of tf.uint8 (#367, #375)
Optimized sequence encoding throught multithreading (#393)

IO

Relocated doctr.documents to doctr.io (#390)

Models

Updated bottleneck channel multiplier in MASTER (#350)
Added dropout in MASTER (#349)
Renamed MASTER attribute for consistency across models (#356)
Added target validation for detection models (#355)
Optimized preprocessing by leveraging multithreading (#370)
Added dtype dynamic resolution and specified error messages (#382)
Harmonized parameter loading in PyTorch (#425)
Enabled backbone pretraining in complex architectures (#435)
Made head & FPN sizing dynamic using the feature extractor in detection models (#435)

Utils

Moved multithreading to doctr.utils (#371)
Improved format validation for visualization features (#392)
Moved doctr.models._utils.rotate_page to doctr.utils.geometry.rotate_image (#371)

Documentation

Updated pypi badge & documentation changelog (#346)
Added export example in documentation (#348)
Reflected relocation of doctr.documents to doctr.io in documentation and README (#390)
Updated recognition benchmark (#395, #441)
Updated model entries (#435)
Updated authors & maintainer references in setup.py and in README (#444)

Tests

Added test case for same aspect ratio resizing (#357)
Extended testing of datasets (#354)
Added test cases for FP16 support of datasets (#367), models (#382), transforms (#388)
Moved multithreading unittests (#371)
Extended test cases of preprocessors (#370)
Reflected relocation of doctr.documents to doctr.io (#390)
Removed unused imports (#391)
Extended test cases for visualization (#392)
Updated unittests of sequence encoding (#393)
Added extra test cases for rotation validation (#438)

References

Added optimal selection of workers for all scripts (#362)
Reflected changes of switching to tf.float32 by default for datasets (#367)
Removed legacy script arg (#380)
Removed unused imports (#391)
Improved device selection for Pytorch training script (#427)
Improved metric logging when the value is undefined (#432)

Others

Updated package version (#346)

mindee/doctr v0.3.1 v0.3.1: Improved models & character classification for pretraining on GitHub

Highlights

Improved pretrained parameters for your favorite models 🚀

Lighter backbones for faster architectures ⚡

Speeding up preprocessors & datasets 🚆

Better demo app 🎨

[beta] Character classification

Breaking changes

Default dtype of TF datasets

I/O module

Multithreading relocated

New features

Datasets

IO

Models

Utils

Transforms

Test

Documentation

References

Others

Bug fixes

Datasets

Models

Transforms

Documentation

References

Others

Improvements

Datasets

IO

Models

Utils

Documentation

Tests

References

Others

mindee/doctr v0.3.1
v0.3.1: Improved models & character classification for pretraining

on GitHub