This patch release fixes issues with preprocessor and greatly improves text detection models.
Brought to you by @fg-mindee & @charlesmindee
Note: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.
Highlights
Improved text detection
With this iteration, DocTR brings you a set of newly pretrained parameters for db_resnet50 which was trained using a much wider range of data augmentations!
| architecture | FUNSD recall | FUNSD precision | CORD recall | CORD precision |
|---|---|---|---|---|
| db_resnet50 + crnn_vgg16_bn (v0.2.0) | 64.8 | 70.3 | 67.7 | 78.4 |
| db_resnet50 + crnn_vgg16_bn (v0.2.1) | 70.08 | 74.77 | 82.19 | 79.67 |
Sequence prediction confidence
Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.
with this snippet
from doctr.documents import DocumentFile
from doctr.models import recognition_predictor
predictor = recognition_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/reco_sample.jpg")
print(predictor(doc))will get you a list of tuples (word value, sequence confidence):
[('invite', 0.9302278757095337)]
More comprehensive representation of predictors
For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.
The following snippet:
from doctr.models import ocr_predictor
print(ocr_predictor())now yields a much cleaner representation of the predictor composition
OCRPredictor(
(det_predictor): DetectionPredictor(
(pre_processor): PreProcessor(
(resize): Resize(output_size=(1024, 1024), method='bilinear')
(normalize): Compose(
(transforms): [
LambdaTransformation(),
Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
]
)
)
(model): DBNet(
(feat_extractor): IntermediateLayerGetter()
(fpn): FeaturePyramidNetwork(channels=128)
(probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
(threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
(postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
)
)
(reco_predictor): RecognitionPredictor(
(pre_processor): PreProcessor(
(resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
(normalize): Compose(
(transforms): [
LambdaTransformation(),
Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
]
)
)
(model): CRNN(
(feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
(decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
(postprocessor): CTCPostProcessor(vocab_size=118)
)
)
(doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
)
Breaking changes
Metrics' granularity
Renamed ExactMatch to TextMatch since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.
| 0.2.0 | 0.2.1 |
|---|---|
>>> from doctr.utils.metrics import ExactMatch >>> metric = ExactMatch(ignore_case=True) >>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"]) >>> print(metric.summary()) 0.75
| >>> from doctr.utils.metrics import TextMatch >>> metric = TextMatch() >>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"]) >>> print(metric.summary()) {'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75}
|
Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.
New features
Models
Deep learning model building and inference
- Added detection features of faces (#258), bar codes (#260)
- Added new pretrained weights for
db_resnet50(#277) - Added sequence probability in text recognition (#284)
Utils
Utility features relevant to the library use cases.
- Added granularity on recognition metrics (#274)
- Added visualization option to display artefacts (#273)
Transforms
Data transformations operations
- Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (#277)
Test
Verifications of the package well-being before release
- added unittests for artefact detection (#258, #260)
- added detailed unittests for granular metrics (#274)
- Extended unittests for resizing (#277)
Documentation
Online resources for potential users
- Added installation instructions for Mac & Windows users (#268)
- Added benchmark of models on private datasets (#269)
- Added changelog to the documentation (#279)
- Added BibTeX citation in README (#279)
- Added parameter count in performance benchmarks (#280)
- Added OCR illustration in README (#283) and documentation (#285)
References
Reference training scripts
- Added support of Weights & biases logging for training scripts (#286)
- Added option to start using pretrained models (#286)
Others
Other tools and implementations
- Added CI job to build for MacOS & Windows (#268)
Bug fixes
Datasets
- Fixed blank image handling in
OCRDataset(#270)
Documents
- Fixed channel order for PDF render into images (#276)
Models
- Fixed normalization step in preprocessors (#277)
Utils
- Fixed
OCRMetricupdate edge case (#267)
Transforms
Documentation
References
- Fixed resizing in recognition script (#266)
Others
- Fixed demo for multi-page examples (#276)
- Fixed image decoding in API routes (#282)
- Fixed preprocessing in API routes (#282)
Improvements
Datasets
Models
- Improved DBNet box computation (#272)
- Refactored preprocessors using transforms (#277)
- Improved repr of preprocessors and models (#277)
- Removed
ignore_caseandignore_accentsfrom recognition postprocessors (#284)
Documents
Documentation
- Updated badges in README & documentation versions (#254)
- Updated landing page of documentation (#279, #285)
- Updated repo folder description in CONTRIBUTING (#282)
- Improved the README's instructions to run the API (#282)
Tests
- Improved unittest of resizing transforms (#266)
- Improved unittests of OCRMetric (#267)
- Improved unittest of PDF rendering (#276)
- Extended unittest of
OCRDataset(#278) - Updated unittest of
DocumentBuilderand recognition models (#284)
References
- Updated training scripts (#284)

