This patch release fixes issues with preprocessor and greatly improves text detection models.

Brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.

Highlights

Improved text detection

With this iteration, DocTR brings you a set of newly pretrained parameters for db_resnet50 which was trained using a much wider range of data augmentations!

architecture	FUNSD recall	FUNSD precision	CORD recall	CORD precision
db_resnet50 + crnn_vgg16_bn (v0.2.0)	64.8	70.3	67.7	78.4
db_resnet50 + crnn_vgg16_bn (v0.2.1)	70.08	74.77	82.19	79.67

Sequence prediction confidence

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Using the following image:

with this snippet

from doctr.documents import DocumentFile
from doctr.models import recognition_predictor
predictor = recognition_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/reco_sample.jpg")
print(predictor(doc))

will get you a list of tuples (word value, sequence confidence):

[('invite', 0.9302278757095337)]

More comprehensive representation of predictors

For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.

The following snippet:

from doctr.models import ocr_predictor
print(ocr_predictor())

now yields a much cleaner representation of the predictor composition

OCRPredictor(
  (det_predictor): DetectionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(1024, 1024), method='bilinear')
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
        ]
      )
    )
    (model): DBNet(
      (feat_extractor): IntermediateLayerGetter()
      (fpn): FeaturePyramidNetwork(channels=128)
      (probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
      (threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
      (postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
    )
  )
  (reco_predictor): RecognitionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
        ]
      )
    )
    (model): CRNN(
      (feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
      (decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
      (postprocessor): CTCPostProcessor(vocab_size=118)
    )
  )
  (doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
)

Breaking changes

Metrics' granularity

Renamed ExactMatch to TextMatch since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.

0.2.0	0.2.1
`>>> from doctr.utils.metrics import ExactMatch` `>>> metric = ExactMatch(ignore_case=True)` `>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])` `>>> print(metric.summary())` `0.75`	`>>> from doctr.utils.metrics import TextMatch` `>>> metric = TextMatch()` `>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])` `>>> print(metric.summary())` `{'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75}`

Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.

New features

Models

Deep learning model building and inference

Added detection features of faces (#258), bar codes (#260)
Added new pretrained weights for db_resnet50 (#277)
Added sequence probability in text recognition (#284)

Utils

Utility features relevant to the library use cases.

Added granularity on recognition metrics (#274)
Added visualization option to display artefacts (#273)

Transforms

Data transformations operations

Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (#277)

Test

Verifications of the package well-being before release

added unittests for artefact detection (#258, #260)
added detailed unittests for granular metrics (#274)
Extended unittests for resizing (#277)

Documentation

Online resources for potential users

Added installation instructions for Mac & Windows users (#268)
Added benchmark of models on private datasets (#269)
Added changelog to the documentation (#279)
Added BibTeX citation in README (#279)
Added parameter count in performance benchmarks (#280)
Added OCR illustration in README (#283) and documentation (#285)

References

Reference training scripts

Added support of Weights & biases logging for training scripts (#286)
Added option to start using pretrained models (#286)

Others

Other tools and implementations

Added CI job to build for MacOS & Windows (#268)

Bug fixes

Datasets

Fixed blank image handling in OCRDataset (#270)

Documents

Fixed channel order for PDF render into images (#276)

Models

Fixed normalization step in preprocessors (#277)

Utils

Fixed OCRMetric update edge case (#267)

Transforms

Fixed Resize when preserving aspect ratio (#266)
Fixed RandomSaturation (#277)

Documentation

Fixed documentation of OCRDataset (#274)
Improved documentation of doctr.documents.elements (#274)

References

Fixed resizing in recognition script (#266)

Others

Fixed demo for multi-page examples (#276)
Fixed image decoding in API routes (#282)
Fixed preprocessing in API routes (#282)

Improvements

Datasets

Added file existence check in dataset constructors (#277)
Refactored dataset methods (#278)

Models

Improved DBNet box computation (#272)
Refactored preprocessors using transforms (#277)
Improved repr of preprocessors and models (#277)
Removed ignore_case and ignore_accents from recognition postprocessors (#284)

Documents

Updated performance benchmarks (#272, #277)

Documentation

Updated badges in README & documentation versions (#254)
Updated landing page of documentation (#279, #285)
Updated repo folder description in CONTRIBUTING (#282)
Improved the README's instructions to run the API (#282)

Tests

Improved unittest of resizing transforms (#266)
Improved unittests of OCRMetric (#267)
Improved unittest of PDF rendering (#276)
Extended unittest of OCRDataset (#278)
Updated unittest of DocumentBuilder and recognition models (#284)

References

Updated training scripts (#284)

Others

Updated requirements (#274)
Updated evaluation script (#277, #284)

mindee/doctr v0.2.1 v0.2.1: Greatly improved text detection models and more stable interface on GitHub

Highlights

Improved text detection

Sequence prediction confidence

More comprehensive representation of predictors

Breaking changes

Metrics' granularity

New features

Models

Utils

Transforms

Test

Documentation

References

Others

Bug fixes

Datasets

Documents

Models

Utils

Transforms

Documentation

References

Others

Improvements

Datasets

Models

Documents

Documentation

Tests

References

Others

mindee/doctr v0.2.1
v0.2.1: Greatly improved text detection models and more stable interface

on GitHub