This release adds support for PyTorch backend & rotated text elements.
Release brought to you by @fg-mindee & @charlesmindee
Note: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
[beta] Welcome PyTorch 🎉
This release comes with exciting news: we added support of PyTorch for the whole library!
If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the USE_TORCH and USE_TF environment variables.
export USE_TORCH='1'Then DocTR will do the rest for you to play along with PyTorch:
import torch
from doctr.models import db_resnet50
model = db_resnet50(pretrained=True).eval()
with torch.no_grad():
out = model(torch.rand(1, 3, 1024, 1024))More pretrained models to come in the next releases!
Support of rotated boxes
Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.
Page reconstruction
Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.
import matplotlib.pyplot as plt
from doctr.utils.visualization import synthesize_page
from doctr.documents import DocumentFile
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)
# Reconstruct the first page
reconstructed_page = synthesize_page(result.export()[0])
plt.imshow(reconstructed_page); plt.show()Using the predictions from our models, we try to synthesize the document with only its textual information!
Breaking changes
Renamed LinkNet
While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed linknet into linknet16
| 0.2.1 | 0.3.0 |
|---|---|
>>> from doctr.models import linknet >>> model = linknet(pretrained=True)
| >>> from doctr.models import linknet16 >>> model = linknet16(pretrained=True)
|
New features
Datasets
Resources to access data in efficient ways
- Added option to yield rotated bounding boxes as target (#281)
- Added support of PyTorch for all datasets (#319)
Documents
Features to manipulate document information
- Added support of rotated bboxes (#281)
- Added entry for MASTER (#300)
- Updated LinkNet entry (#313)
- Added code of conduct (#325)
Models
Deep learning model building and inference
- Added rotated cropping feature & inference mode (#281)
- Added spatial masked loss support for LinkNet (#296)
- Added page orientation estimation feature (#293)
- Added box target rotation feature (#297)
- Added support of MASTER recognition model & transformer (#300, #342)
- Added Focal loss support to linknet (#304, #311)
- Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317),
conv_sequence& parameter loading (#323),resnet31(#327),vgg16_bn(#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342) - Added cleaner verified file downloading function (#319)
- Added upfront page orientation estimation (#324) by @Rob192
Utils
Utility features relevant to the library use cases.
- Added Mask IoU computation (#290)
- Added straight <--> rotated bbox conversion and metric computation support (#281)
- Added page synthesis feature (#320)
- Added IoA, and NMS (#332)
Transforms
Data transformations operations
Test
Verifications of the package well-being before release
- Added unittest for maks IoU computation (#290)
- Added unittests for rotated bbox support (#281, #297)
- Added unittests for page orientation estimation (#293, #324)
- Added unittests for MASTER (#300, #309)
- Added test case for the focal loss of LinkNet (#304)
- Added unittests for Pytorch integration (#310, #313, #317, #319, #322, #323, #327, #318, #329, #335, #340, #342)
- Added unittests for IoA & NMS (#332)
Documentation
Online resources for potential users
- Added instructions to install DocTR with PyTorch or TF (#306)
- Added specific instructions to run checks in CONTRIBUTING (#321)
References
Reference training scripts
- Added support of rotated bounding box targets (#281)
Others
Other tools and implementations
- Added support of rotated bounding box target & inference mode (#281)
- Added framework availability check (#306, #314, #315)
- Added CI job for pytorch unittests (#310)
- Added CI jobs to build DocTR with multiple python version, environment and framework (#314, #315)
- Updated demo to add page reconstruction (#320)
- Added PyTorch & torchvision to environment collection script (#345) & updated the bug template
Bug fixes
Documentation
- Fixed entry of datasets (#344)
Tests
References
- Fixed missing import of
wandbin the detection script (#288) - Fixed edge case of recognition model output unpacking in the recognition training script (#291)
- Fixed model output unpacking in the detection script (#301)
- Fixed
wandbconfig for training scripts (#302)
Others
- Fixed edge case of recognition model output unpacking in the evaluation script (#291)
- Fixed mypy config and related typing annotations (#308, #312, #314, #336)
Improvements
Datasets
Documents
- Updated README badge & documentation versioning (#287)
- Harmonized benchmark table formatting of figures (#281)
- Updated demo illustration in README (#326)
Documentation
- Updated documentation font and mentioned PyTorch support in README & docs (#344)
Tests
References
- Reordered script option to save time for test-only (#294)
Others
- Updated package version (#287)
- Removed unused imports (#295, #307, #336)
- Updated API requirements for security and cleaned Dockerfile (#303)
- Improved setuptools classifiers and installation process (#306)
🙏 Thanks to our contributors 🙏
@Rob192


