github mindee/doctr v0.2.1
v0.2.1: Greatly improved text detection models and more stable interface

latest releases: v1.0.0, v0.12.0, v0.11.0...
4 years ago

This patch release fixes issues with preprocessor and greatly improves text detection models.

Brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.

Highlights

Improved text detection

With this iteration, DocTR brings you a set of newly pretrained parameters for db_resnet50 which was trained using a much wider range of data augmentations!

architecture FUNSD recall FUNSD precision CORD recall CORD precision
db_resnet50 + crnn_vgg16_bn (v0.2.0) 64.8 70.3 67.7 78.4
db_resnet50 + crnn_vgg16_bn (v0.2.1) 70.08 74.77 82.19 79.67

OCR sample

Sequence prediction confidence

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Using the following image:
reco_sample

with this snippet

from doctr.documents import DocumentFile
from doctr.models import recognition_predictor
predictor = recognition_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/reco_sample.jpg")
print(predictor(doc))

will get you a list of tuples (word value, sequence confidence):

[('invite', 0.9302278757095337)]

More comprehensive representation of predictors

For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.

The following snippet:

from doctr.models import ocr_predictor
print(ocr_predictor())

now yields a much cleaner representation of the predictor composition

OCRPredictor(
  (det_predictor): DetectionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(1024, 1024), method='bilinear')
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
        ]
      )
    )
    (model): DBNet(
      (feat_extractor): IntermediateLayerGetter()
      (fpn): FeaturePyramidNetwork(channels=128)
      (probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
      (threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
      (postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
    )
  )
  (reco_predictor): RecognitionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
        ]
      )
    )
    (model): CRNN(
      (feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
      (decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
      (postprocessor): CTCPostProcessor(vocab_size=118)
    )
  )
  (doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
)

Breaking changes

Metrics' granularity

Renamed ExactMatch to TextMatch since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.

0.2.0 0.2.1
>>> from doctr.utils.metrics import ExactMatch
>>> metric = ExactMatch(ignore_case=True)
>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])
>>> print(metric.summary())
0.75
>>> from doctr.utils.metrics import TextMatch
>>> metric = TextMatch()
>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])
>>> print(metric.summary())
{'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75}

Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.

New features

Models

Deep learning model building and inference

  • Added detection features of faces (#258), bar codes (#260)
  • Added new pretrained weights for db_resnet50 (#277)
  • Added sequence probability in text recognition (#284)

Utils

Utility features relevant to the library use cases.

  • Added granularity on recognition metrics (#274)
  • Added visualization option to display artefacts (#273)

Transforms

Data transformations operations

  • Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (#277)

Test

Verifications of the package well-being before release

  • added unittests for artefact detection (#258, #260)
  • added detailed unittests for granular metrics (#274)
  • Extended unittests for resizing (#277)

Documentation

Online resources for potential users

  • Added installation instructions for Mac & Windows users (#268)
  • Added benchmark of models on private datasets (#269)
  • Added changelog to the documentation (#279)
  • Added BibTeX citation in README (#279)
  • Added parameter count in performance benchmarks (#280)
  • Added OCR illustration in README (#283) and documentation (#285)

References

Reference training scripts

  • Added support of Weights & biases logging for training scripts (#286)
  • Added option to start using pretrained models (#286)

Others

Other tools and implementations

  • Added CI job to build for MacOS & Windows (#268)

Bug fixes

Datasets

  • Fixed blank image handling in OCRDataset (#270)

Documents

  • Fixed channel order for PDF render into images (#276)

Models

  • Fixed normalization step in preprocessors (#277)

Utils

  • Fixed OCRMetric update edge case (#267)

Transforms

  • Fixed Resize when preserving aspect ratio (#266)
  • Fixed RandomSaturation (#277)

Documentation

  • Fixed documentation of OCRDataset (#274)
  • Improved documentation of doctr.documents.elements (#274)

References

  • Fixed resizing in recognition script (#266)

Others

  • Fixed demo for multi-page examples (#276)
  • Fixed image decoding in API routes (#282)
  • Fixed preprocessing in API routes (#282)

Improvements

Datasets

  • Added file existence check in dataset constructors (#277)
  • Refactored dataset methods (#278)

Models

  • Improved DBNet box computation (#272)
  • Refactored preprocessors using transforms (#277)
  • Improved repr of preprocessors and models (#277)
  • Removed ignore_case and ignore_accents from recognition postprocessors (#284)

Documents

  • Updated performance benchmarks (#272, #277)

Documentation

  • Updated badges in README & documentation versions (#254)
  • Updated landing page of documentation (#279, #285)
  • Updated repo folder description in CONTRIBUTING (#282)
  • Improved the README's instructions to run the API (#282)

Tests

  • Improved unittest of resizing transforms (#266)
  • Improved unittests of OCRMetric (#267)
  • Improved unittest of PDF rendering (#276)
  • Extended unittest of OCRDataset (#278)
  • Updated unittest of DocumentBuilder and recognition models (#284)

References

  • Updated training scripts (#284)

Others

  • Updated requirements (#274)
  • Updated evaluation script (#277, #284)

Don't miss a new doctr release

NewReleases is sending notifications on new releases.