What's Changed
- Pretraining has been reimplemented to be more faithful to the original publication for more stable memory consumption and easier hyperparameter selection
- Learning rate warmup and backbone freezing in recognition training with
--warmupand--freeze-backbone(mostly to enable fine-tuning pretrained models) - Enable
ketos compileto create precompiled datasets with lines without a corresponding transcription with the--keep-empty-linesswitch (mostly for pretraining models). --failed-sample-thresholdin training modules, aborting training after a certain number of samples failed to load- tensorboard logging with
--logger/--log-diroptions - Change codec construction during training when training and validation dataset alphabets don't match. Prior code points that only exist in the validation set would be copied to the model codec. Now the model codec only contains trained code points.
- Replace
ocr_recordwith new smart classesBaselineOCRRecordandBBoxOCRRecord. These keep track of reading/display order, compute bounding polygons from the whole line bounding polygon, and average confidences when slicing. - ALTO parsing now deals with any reasonable PointsType (see altoxml/schema#49)
- The fallback line orientation heuristic now takes into account the principal text orientation defined with
--text-directioninstead of assuming horizontal lines (--text-direction horizontal-lr/-rl). - Baseline segmentation now supports padding of input images with
--pad. - CLI now allows serialization with custom jinja2 templates through the
--templateoption. - Switch validation metrics computation to torchmetrics.
- Various bugfixes, mostly to deal with shapely shenanigans.
Thanks
- @sixtyfive, @anutkk, @stweil, @colibrisson, @PonteIneptique for their contributions to this release.
Full Changelog: 4.2.0...4.3.0