This release adds support for PyTorch backend & rotated text elements.

Release brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

[beta] Welcome PyTorch 🎉

This release comes with exciting news: we added support of PyTorch for the whole library!

If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the USE_TORCH and USE_TF environment variables.

export USE_TORCH='1'

Then DocTR will do the rest for you to play along with PyTorch:

import torch
from doctr.models import db_resnet50
model = db_resnet50(pretrained=True).eval()
with torch.no_grad():
    out = model(torch.rand(1, 3, 1024, 1024))

More pretrained models to come in the next releases!

Support of rotated boxes

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Page reconstruction

Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.

import matplotlib.pyplot as plt
from doctr.utils.visualization import synthesize_page
from doctr.documents import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)

# Reconstruct the first page
reconstructed_page = synthesize_page(result.export()[0])
plt.imshow(reconstructed_page); plt.show()

Using the predictions from our models, we try to synthesize the document with only its textual information!

Breaking changes

Renamed LinkNet

While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed linknet into linknet16

0.2.1	0.3.0
`>>> from doctr.models import linknet` `>>> model = linknet(pretrained=True)`	`>>> from doctr.models import linknet16` `>>> model = linknet16(pretrained=True)`

New features

Datasets

Resources to access data in efficient ways

Added option to yield rotated bounding boxes as target (#281)
Added support of PyTorch for all datasets (#319)

Documents

Features to manipulate document information

Added support of rotated bboxes (#281)
Added entry for MASTER (#300)
Updated LinkNet entry (#313)
Added code of conduct (#325)

Models

Deep learning model building and inference

Added rotated cropping feature & inference mode (#281)
Added spatial masked loss support for LinkNet (#296)
Added page orientation estimation feature (#293)
Added box target rotation feature (#297)
Added support of MASTER recognition model & transformer (#300, #342)
Added Focal loss support to linknet (#304, #311)
Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317), conv_sequence & parameter loading (#323), resnet31 (#327), vgg16_bn (#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342)
Added cleaner verified file downloading function (#319)
Added upfront page orientation estimation (#324) by @Rob192

Utils

Utility features relevant to the library use cases.

Added Mask IoU computation (#290)
Added straight <--> rotated bbox conversion and metric computation support (#281)
Added page synthesis feature (#320)
Added IoA, and NMS (#332)

Transforms

Data transformations operations

Added support of custom Resize in PyTorch (#313), ColorInversion (#322)

Test

Verifications of the package well-being before release

Added unittest for maks IoU computation (#290)
Added unittests for rotated bbox support (#281, #297)
Added unittests for page orientation estimation (#293, #324)
Added unittests for MASTER (#300, #309)
Added test case for the focal loss of LinkNet (#304)
Added unittests for Pytorch integration (#310, #313, #317, #319, #322, #323, #327, #318, #329, #335, #340, #342)
Added unittests for IoA & NMS (#332)

Documentation

Online resources for potential users

Added instructions to install DocTR with PyTorch or TF (#306)
Added specific instructions to run checks in CONTRIBUTING (#321)

References

Reference training scripts

Added support of rotated bounding box targets (#281)

Others

Other tools and implementations

Added support of rotated bounding box target & inference mode (#281)
Added framework availability check (#306, #314, #315)
Added CI job for pytorch unittests (#310)
Added CI jobs to build DocTR with multiple python version, environment and framework (#314, #315)
Updated demo to add page reconstruction (#320)
Added PyTorch & torchvision to environment collection script (#345) & updated the bug template

Bug fixes

Documentation

Fixed entry of datasets (#344)

Tests

Fixed ColorInversion unittest (#298, #339)

References

Fixed missing import of wandb in the detection script (#288)
Fixed edge case of recognition model output unpacking in the recognition training script (#291)
Fixed model output unpacking in the detection script (#301)
Fixed wandb config for training scripts (#302)

Others

Fixed edge case of recognition model output unpacking in the evaluation script (#291)
Fixed mypy config and related typing annotations (#308, #312, #314, #336)

Improvements

Datasets

Improved constructors of OCRDataset and CORD (#289, #299)
Silenced numpy dtype warnings (#336)

Documents

Updated README badge & documentation versioning (#287)
Harmonized benchmark table formatting of figures (#281)
Updated demo illustration in README (#326)

Documentation

Updated documentation font and mentioned PyTorch support in README & docs (#344)

Tests

Updated unittest image (#337)
Cleaned up unittest folder separation (#338)

References

Reordered script option to save time for test-only (#294)

Others

Updated package version (#287)
Removed unused imports (#295, #307, #336)
Updated API requirements for security and cleaned Dockerfile (#303)
Improved setuptools classifiers and installation process (#306)

🙏 Thanks to our contributors 🙏
@Rob192

mindee/doctr v0.3.0 v0.3.0: Support for PyTorch in beta and rotated text on GitHub

Highlights

[beta] Welcome PyTorch 🎉

Support of rotated boxes

Page reconstruction

Breaking changes

Renamed LinkNet

New features

Datasets

Documents

Models

Utils

Transforms

Test

Documentation

References

Others

Bug fixes

Documentation

Tests

References

Others

Improvements

Datasets

Documents

Documentation

Tests

References

Others

mindee/doctr v0.3.0
v0.3.0: Support for PyTorch in beta and rotated text

on GitHub