github mindee/doctr v0.3.0
v0.3.0: Support for PyTorch in beta and rotated text

latest releases: v1.0.1, v1.0.0, v0.12.0...
4 years ago

This release adds support for PyTorch backend & rotated text elements.

Release brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

[beta] Welcome PyTorch 🎉

This release comes with exciting news: we added support of PyTorch for the whole library!

If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the USE_TORCH and USE_TF environment variables.

export USE_TORCH='1'

Then DocTR will do the rest for you to play along with PyTorch:

import torch
from doctr.models import db_resnet50
model = db_resnet50(pretrained=True).eval()
with torch.no_grad():
    out = model(torch.rand(1, 3, 1024, 1024))

More pretrained models to come in the next releases!

Support of rotated boxes

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Rotated bounding boxes

Page reconstruction

Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.

import matplotlib.pyplot as plt
from doctr.utils.visualization import synthesize_page
from doctr.documents import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)

# Reconstruct the first page
reconstructed_page = synthesize_page(result.export()[0])
plt.imshow(reconstructed_page); plt.show()

Original image Page reconstruction

Using the predictions from our models, we try to synthesize the document with only its textual information!

Breaking changes

Renamed LinkNet

While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed linknet into linknet16

0.2.1 0.3.0
>>> from doctr.models import linknet
>>> model = linknet(pretrained=True)
>>> from doctr.models import linknet16
>>> model = linknet16(pretrained=True)

New features

Datasets

Resources to access data in efficient ways

  • Added option to yield rotated bounding boxes as target (#281)
  • Added support of PyTorch for all datasets (#319)

Documents

Features to manipulate document information

  • Added support of rotated bboxes (#281)
  • Added entry for MASTER (#300)
  • Updated LinkNet entry (#313)
  • Added code of conduct (#325)

Models

Deep learning model building and inference

  • Added rotated cropping feature & inference mode (#281)
  • Added spatial masked loss support for LinkNet (#296)
  • Added page orientation estimation feature (#293)
  • Added box target rotation feature (#297)
  • Added support of MASTER recognition model & transformer (#300, #342)
  • Added Focal loss support to linknet (#304, #311)
  • Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317), conv_sequence & parameter loading (#323), resnet31 (#327), vgg16_bn (#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342)
  • Added cleaner verified file downloading function (#319)
  • Added upfront page orientation estimation (#324) by @Rob192

Utils

Utility features relevant to the library use cases.

  • Added Mask IoU computation (#290)
  • Added straight <--> rotated bbox conversion and metric computation support (#281)
  • Added page synthesis feature (#320)
  • Added IoA, and NMS (#332)

Transforms

Data transformations operations

  • Added support of custom Resize in PyTorch (#313), ColorInversion (#322)

Test

Verifications of the package well-being before release

Documentation

Online resources for potential users

  • Added instructions to install DocTR with PyTorch or TF (#306)
  • Added specific instructions to run checks in CONTRIBUTING (#321)

References

Reference training scripts

  • Added support of rotated bounding box targets (#281)

Others

Other tools and implementations

  • Added support of rotated bounding box target & inference mode (#281)
  • Added framework availability check (#306, #314, #315)
  • Added CI job for pytorch unittests (#310)
  • Added CI jobs to build DocTR with multiple python version, environment and framework (#314, #315)
  • Updated demo to add page reconstruction (#320)
  • Added PyTorch & torchvision to environment collection script (#345) & updated the bug template

Bug fixes

Documentation

  • Fixed entry of datasets (#344)

Tests

  • Fixed ColorInversion unittest (#298, #339)

References

  • Fixed missing import of wandb in the detection script (#288)
  • Fixed edge case of recognition model output unpacking in the recognition training script (#291)
  • Fixed model output unpacking in the detection script (#301)
  • Fixed wandb config for training scripts (#302)

Others

  • Fixed edge case of recognition model output unpacking in the evaluation script (#291)
  • Fixed mypy config and related typing annotations (#308, #312, #314, #336)

Improvements

Datasets

  • Improved constructors of OCRDataset and CORD (#289, #299)
  • Silenced numpy dtype warnings (#336)

Documents

  • Updated README badge & documentation versioning (#287)
  • Harmonized benchmark table formatting of figures (#281)
  • Updated demo illustration in README (#326)

Documentation

  • Updated documentation font and mentioned PyTorch support in README & docs (#344)

Tests

  • Updated unittest image (#337)
  • Cleaned up unittest folder separation (#338)

References

  • Reordered script option to save time for test-only (#294)

Others

  • Updated package version (#287)
  • Removed unused imports (#295, #307, #336)
  • Updated API requirements for security and cleaned Dockerfile (#303)
  • Improved setuptools classifiers and installation process (#306)

🙏 Thanks to our contributors 🙏
@Rob192

Don't miss a new doctr release

NewReleases is sending notifications on new releases.