mittagessen/kraken 7.0.0b2 on GitHub

kraken 7.0 introduces major changes to training, inference, model handling, and extensibility.
If you are upgrading from 6.0.x as an average user, start with Breaking Changes and Command Line Behavior.

Installing the Beta

Install the latest available 7.0 pre-release from PyPI:

$ pip install --upgrade --pre kraken

Install this specific beta explicitly:

$ pip install --upgrade "kraken==7.0b1"

Breaking Changes

Python 3.9 support was dropped. kraken now supports Python 3.10 through 3.13.
Device and precision options are now global on both kraken and ketos commands.
Training and evaluation manifest option names changed from --training-files/--evaluation-files to --training-data/--evaluation-data.
ketos train, ketos segtrain, ketos rotrain, and ketos pretrain now produce checkpoints and convert the best checkpoint to a weights file after training.
Segmentation training class filtering/merging CLI options were removed. Class mapping is now defined in YAML experiment files.
ketos segtest metrics are now computed against a configurable class mapping, and baseline detection metrics replace the older, less informative pixel accuracy/IoU-only view.
ketos compile fixed splits were removed due to a significant performance penalty. Use separate dataset files per split instead.
The API for both training and inference has been reworked extensively.
safetensors is now the default output format for trained weights.
Neural reading order models are only executed when using the new task API.
Recognition and segmentation inference accelerators now default to auto, selecting the highest-performance available device.

In practice: most existing workflows keep working after small updates, but training artifacts and API entry points changed enough that scripted pipelines and API-use will need adaptation.

Bug Fixes

Fixed a breaking bug in reading order models that prevented trained model weights from loading.

Features and Improvements

A plugin system now allows easy extension of kraken functionality with new segmentation, recognition, and reading order implementations.
Persistent configuration through experiment YAML files has been added to ketos.
The new recognition API supports batching plus parallelized line extraction/processing, enabling effective GPU inference. Speedups of around 80% were observed on CPU, with even larger gains with GPU acceleration.
Character cuts on BaselineOCRRecord are now computed at initialization using a more efficient algorithm. This substantially reduces serialization overhead in the default --subline-segmentation mode.
Baseline detection metrics inspired by the Transkribus Evaluation Scheme are now computed during segmentation training. Unlike older pixel-based metrics, these scores correlate more directly with actual line detection quality.
The XML parser has been reworked for better robustness against invalid input. When PageXML files contain invalid image dimensions, kraken now attempts to read dimensions from the referenced image file. Reading-order parsing was also fully reimplemented to handle partial explicit orders and multi-level ordering more gracefully.

Plugins

kraken can now use external implementations of layout analysis, text recognition, and reading order determination through Python entry points.

Plugins are distributed as regular Python packages. After installation, kraken discovers them automatically through entry points. Plugin model files are then used exactly like native kraken model files: pass them to --model on the CLI or load them via task classes in Python.

Example workflow with a D-FINE layout analysis plugin model:

# install plugin package
$ pip install git+https://github.com/mittagessen/dfine_kraken.git

# run layout analysis with a plugin model file
$ kraken -i page.tif page.json segment --baseline --model dfine_layout.safetensors

The same model can be loaded programmatically with SegmentationTaskModel.load_model('dfine_layout.safetensors').

Command Line Behavior

Inference

Device and precision are now global options on kraken.
Set them before subcommands:

# CPU inference in full precision
$ kraken -i page.tif page.txt --device cpu --precision 32-true \
  segment -bl ocr -m model.safetensors

# GPU inference with mixed bfloat16 precision
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors

Recognition now exposes two throughput controls:

-B/--batch-size: number of extracted line images sent per recognition forward pass.
--num-line-workers: number of CPU worker processes used to extract/preprocess line images. Use 0 to keep extraction in-process.

# conservative settings for small GPUs or CPU-only runs
$ kraken -i page.tif page.txt segment -bl ocr -m model.safetensors \
  -B 8 --num-line-workers 2

# higher-throughput GPU settings
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8

Training

Experiment Files

Managing non-trivial training configurations from CLI flags alone was difficult, especially when heavily modifying segmentation class taxonomies. To address this, ketos now supports YAML experiment files.

Pass an experiment file with --config before the command name:

$ ketos --config experiments.yml segtrain

YAML keys correspond to the internal parameter names used by the CLI.

Minimal segmentation training experiment file:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1
segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors
  line_class_mapping:
    - ['*', 3]
    - ['DefaultLine', 3]

Single experiment file containing multiple commands:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1

train:
  training_data:
    - rec_train.lst
  evaluation_data:
    - rec_val.lst
  format_type: xml
  checkpoint_path: rec_checkpoints
  weights_format: safetensors

segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors

Configurations for multiple commands can be saved in the same experiment file.

Recommendation: move non-trivial setups (class mappings, optimizer/scheduler settings, hardware defaults) into YAML so runs are reproducible and easier to review.

Training Outputs, Checkpoints, and Weights

For ketos train, ketos segtrain, and ketos rotrain, training now produces Lightning checkpoints (.ckpt) as the primary artifact instead of writing CoreML weights directly during training.

Checkpoint files include full training state (model weights, optimizer state, scheduler state, epoch/step counters, and serialized training config), enabling exact continuation of interrupted runs.

There are now two distinct continuation modes:

--resume restores and continues from the checkpoint's exact previous training state. The checkpoint state is authoritative, even if command-line flags or config files specify different values.
--load keeps the previous fine-tune/start-new-run behavior. It loads weights only and starts a fresh run using current CLI/config hyperparameters.

Use --resume when you want to continue the same run.
Use --load when you want to start a new run from existing weights.

In addition to regular checkpoints, kraken now writes an emergency abort checkpoint by default (checkpoint_abort.ckpt) when a training run exits via exception (for example, a crash or a forceful abort). This gives you a recovery point even when a run terminates unexpectedly.

Because checkpoints contain much more than deployable model weights and may execute arbitrary python code on load, distribute converted weights files rather than raw checkpoints. Conversion strips training-only state and produces a distribution-safe weights artifact.

At the end of training, kraken automatically converts the best checkpoint into a weights file. You can also convert manually with ketos convert.

The default weights format is now safetensors. Compared to legacy coreml weights, safetensors supports serialization of arbitrary model types, while coreml is limited to core model methods implemented in kraken.

Use --weights-format coreml only when you explicitly need legacy compatibility.

Testing

Segmentation test output now includes metrics computed on vectorized baselines that correlate with segmentation quality, making model selection for line detection much easier. segtest behavior also changed with the checkpoint/weights distinction. In previous releases, test data often had to mirror post-merge/post-filter training mappings, which made evaluation cumbersome without rewriting source labels.

In short: you can now evaluate more datasets directly, with less taxonomy rewriting.

Example segtest invocation:

$ ketos --device cpu segtest -m best_0.9471.safetensors -e test_manifest.lst -f xml

Example output excerpt:

Category  Class Name    Pixel Accuracy  IOU    Object Count
aux       _start_sep    1.000           1.000  N/A
aux       _end_sep      1.000           1.000  N/A
regions   Text_Region   0.992           0.964  184
regions   Foot_Notes    0.973           0.887  36

Class         Precision  Recall  F1
Overall       0.947      0.933   0.940
DefaultLine   0.959      0.946   0.952
Marginalia    0.891      0.874   0.882

Class mappings are now stored in two forms in checkpoints and new weights files:

A full mapping with all transformations (merges/filtering) from training taxonomy to model outputs.
A canonical one-to-one mapping between label indices and class strings.

By default, evaluation uses the full mapping. Canonical mapping is used when explicitly requested and as a fallback for pre-7.0 model files. Fully custom mappings can also be defined in an experiment file.

Class mapping modes in ketos segtest:

# Use full (many-to-one) training mapping from checkpoint metadata
$ ketos segtest -m model.ckpt -e test.lst --test-class-mapping-mode full

# Use canonical one-to-one model output mapping
$ ketos segtest -m model.safetensors -e test.lst --test-class-mapping-mode canonical

# Provide explicit mapping for the test set taxonomy
$ ketos --config segtest_custom.yml segtest -m model.safetensors -e test.lst \
  --test-class-mapping-mode custom

# segtest_custom.yml
segtest:
  line_class_mapping:
    - ['DefaultLine', 3]
    - ['Running_Title', 3]
    - ['Marginal_Note', 4]
  region_class_mapping:
    - ['Text_Region', 5]
    - ['Foot_Notes', 6]

For easier debugging, ketos segtest now prints an explicit mapping between test-set classes and model classes, including clear indicators for merges, missing labels, and conflicts.

Example class taxonomy diagnostics table:

Class Mapping Diagnostics (model=full, dataset=effective)
Category   Class Name      Model Idx  Dataset Idx  Observed  Effective  Status
baselines  DefaultLine     3          3            812       812        ok
baselines  Running_Title   3          3            57        57         ok
baselines  Rubrication     4          -            14        0          ignored by dataset mapping
regions    Text_Region     5          5            184       184        ok
regions    Illustration    -          7            22        22         missing in model mapping

API

Configuration Classes

In previous versions of kraken, training and inference hyperparameters were defined in dictionaries in the default_specs module. This was error-prone and resulted in verbose code in the command line drivers.

If you maintain Python training/inference scripts, migrate to typed config classes for better defaults, clearer parameter names, and safer checkpoint serialization.

Before (6.0.x) using default_specs dictionaries:

from kraken.lib.default_specs import RECOGNITION_HYPER_PARAMS
from kraken.lib.train import RecognitionModel

hyper_params = RECOGNITION_HYPER_PARAMS.copy()
hyper_params.update({'batch_size': 8, 'lrate': 1e-3})
model = RecognitionModel(hyper_params=hyper_params, training_data=['train.lst'])

After (7.0) using typed configuration classes:

from kraken.configs import (RecognitionInferenceConfig,
                            VGSLRecognitionTrainingConfig,
                            VGSLRecognitionTrainingDataConfig)

infer_cfg = RecognitionInferenceConfig(batch_size=8,
                                       num_line_workers=4,
                                       precision='bf16-mixed')

train_cfg = VGSLRecognitionTrainingConfig(lrate=1e-3,
                                          quit='early',
                                          epochs=24)
data_cfg = VGSLRecognitionTrainingDataConfig(training_data=['train.lst'],
                                             evaluation_data=['val.lst'],
                                             format_type='xml')

Task-based API for Inference

blla.segment(), align.forced_align(), and rpred.rpred()/rpred.mm_rpred() have been replaced by implementation-agnostic task classes that provide better performance and flexibility. The largest gains are in text recognition, where CPU inference improves by roughly 80% through parallelization. Batching additionally enables efficient GPU utilization.

If you call legacy APIs directly, plan a migration to kraken.tasks soon. Legacy interfaces remain available for now but are deprecated.

To migrate an existing segmentation workflow, replace:

from PIL import Image
from kraken.blla import segment
from kraken.lib.vgsl import TorchVGSLModel

model = TorchVGSLModel.load_model('/path/to/segmentation/model.coreml')
im = Image.open('sample.jpg')
seg = segment(im, model=model)

with:

from PIL import Image
from kraken.tasks import SegmentationTaskModel
from kraken.configs import SegmentationInferenceConfig

segmenter = SegmentationTaskModel.load_model('/path/to/segmentation/models.safetensors')
im = Image.open('sample.jpg')
seg = segmenter.predict(im=im, config=SegmentationInferenceConfig())

For recognition. Before:

from PIL import Image
from kraken.rpred import rpred
from kraken.lib.models import load_any

net = load_any('/path/to/recognition/model.mlmodel')
for record in rpred(net, im, segmentation=seg):
    print(record)

After:

from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
for record in recognizer.predict(im=im, segmentation=seg, config=RecognitionInferenceConfig(batch_size=8, num_line_workers=4)):
    print(record)

Recognition now supports batching (batch_size in RecognitionInferenceConfig) and parallel line extraction (num_line_workers), making GPU acceleration practical.

CUDA example with explicit accelerator/device settings:

from PIL import Image
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
config = RecognitionInferenceConfig(accelerator='gpu',
                                    device=[0],
                                    precision='bf16-mixed',
                                    batch_size=64,
                                    num_line_workers=8)
for record in recognizer.predict(im=Image.open('page.tif'), segmentation=seg, config=config):
    print(record.prediction)

The new recognition API does not support tag-based multi-model recognition (rpred.mm_rpred()), which was dropped to simplify batched inference.

For forced alignment. Before:

from PIL import Image
from kraken.containers import Segmentation, BaselineLine
from kraken.align import forced_align
from kraken.lib.models import load_any

model = load_any('model.mlmodel')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(imagename='image.png', lines=[line])

aligned_segmentation = forced_align(segmentation, model) 
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

After:

from PIL import Image
from kraken.tasks import ForcedAlignmentTaskModel
from kraken.containers import Segmentation, BaselineLine
from kraken.configs import RecognitionInferenceConfig
                                                                                                                    
# Assume `model.mlmodel` is a recognition model
model = ForcedAlignmentTaskModel.load_model('model.mlmodel')
im = Image.open('image.png')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(lines=[line])
config = RecognitionInferenceConfig()
                                                                                                                    
aligned_segmentation = model.predict(im, segmentation, config)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

The old interfaces remain available but are deprecated and will be removed in kraken 8.

Training Refactor

The training module has been moved from kraken.lib.train to kraken.train (with reading order and pretraining modules in kraken.lib.ro/kraken.lib.pretrain). Training now uses explicit configuration objects and consistently uses LightningDataModule-derived classes.

If you run training programmatically, update imports and constructors and switch hyperparameter dicts to config objects.

Before (6.0.x) style instantiation:

from kraken.lib.train import RecognitionModel, SegmentationModel
from kraken.lib.pretrain.model import RecognitionPretrainModel
from kraken.lib.ro.model import RODataModule, ROModel

rec = RecognitionModel(hyper_params={'batch_size': 8},
                       training_data=['train.lst'],
                       evaluation_data=['val.lst'])
seg = SegmentationModel(hyper_params={'epochs': 50},
                        training_data=['seg_train.lst'],
                        evaluation_data=['seg_val.lst'])
pre = RecognitionPretrainModel(hyper_params={'mask_prob': 0.5})
ro_dm = RODataModule(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'])
ro = ROModel(feature_dim=128, class_mapping={'default': 1}, hyper_params={'epochs': 3000})

After (7.0) style instantiation:

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.lib.pretrain import PretrainDataModule, RecognitionPretrainModel
from kraken.lib.ro import RODataModule, ROModel
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTrainingDataConfig,
                            VGSLPreTrainingConfig, VGSLPreTrainingDataConfig,
                            ROTrainingConfig, ROTrainingDataConfig)

rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(training_data=['train.lst'], evaluation_data=['val.lst'], format_type='xml'))
rec_model = VGSLRecognitionModel(VGSLRecognitionTrainingConfig(epochs=24, quit='early'))

seg_dm = BLLASegmentationDataModule(BLLASegmentationTrainingDataConfig(training_data=['seg_train.lst'], evaluation_data=['seg_val.lst'], format_type='xml'))
seg_model = BLLASegmentationModel(BLLASegmentationTrainingConfig(epochs=50, quit='fixed'))

pre_dm = PretrainDataModule(VGSLPreTrainingDataConfig(training_data=['pretrain_train.lst'], evaluation_data=['pretrain_val.lst'], format_type='path'))
pre_model = RecognitionPretrainModel(VGSLPreTrainingConfig(mask_prob=0.5))

ro_dm = RODataModule(ROTrainingDataConfig(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'], format_type='xml', level='baselines'))
ro_model = ROModel(ROTrainingConfig(epochs=3000, quit='early'))

The KrakenTrainer module works as before.

In addition, separate test routines are now integrated into Lightning modules, allowing straightforward programmatic execution of the test loop for segmentation and recognition.

Example: programmatic test loop execution with KrakenTrainer.test():

KrakenTrainer.test() returns typed metric containers:

Recognition (RecognitionTestMetrics): character_counts, num_errors, cer, wer, case_insensitive_cer, confusions, scripts, insertions, deletes, substitutions
Segmentation (SegmentationTestMetrics): class_pixel_accuracy, mean_accuracy, class_iu, mean_iu, freq_iu, region_iu, bl_precision, bl_recall, bl_f1, bl_detection_per_class

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTestDataConfig)

trainer = KrakenTrainer(accelerator='cpu', devices=1, precision='32-true')

rec_model = VGSLRecognitionModel.load_from_weights('rec_best.safetensors',
                                                   VGSLRecognitionTrainingConfig())
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(test_data=['rec_test.lst'], format_type='xml'))
rec_metrics = trainer.test(rec_model, rec_dm)

seg_model = BLLASegmentationModel.load_from_weights('seg_best.safetensors',
                                                    BLLASegmentationTrainingConfig())
seg_dm = BLLASegmentationDataModule(BLLASegmentationTestDataConfig(test_data=['seg_test.lst'],
                                                                   format_type='xml',
                                                                   test_class_mapping_mode='canonical'))
seg_metrics = trainer.test(seg_model, seg_dm)

Plugin System and Model Base Classes

kraken now supports alternative segmentation and recognition implementations through a plugin system based on Python entry points. To be compatible, plugins must implement the interfaces defined by the abstract kraken.models.BaseModel class. kraken.models.SegmentationBaseModel and kraken.models.RecognitionBaseModel provide task-specific base interfaces.

This is primarily relevant if you are extending kraken with custom model types or distributing third-party integrations.

Rough implementation skeletons:

from torch import nn
from kraken.models import BaseModel, SegmentationBaseModel, RecognitionBaseModel

class MySegmentationModel(nn.Module, SegmentationBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['segmentation']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im): ...


class MyRecognitionModel(nn.Module, RecognitionBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['recognition']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im, segmentation): ...

To be discoverable by kraken, these classes must be registered as entry points in your setup.cfg or similar under the kraken.models group with their class name:

Example from kraken's own setup.cfg:

[entry_points]
kraken.models =
  TorchVGSLModel = kraken.lib.vgsl:TorchVGSLModel
  Wav2Vec2Mask = kraken.lib.pretrain:Wav2Vec2Mask
  ROMLP = kraken.lib.ro:ROMLP

There is an example plugin in D-FINE kraken incorporating the D-FINE object detector for layout analysis.

Model Handling

kraken replaced type-specific model loaders with a modular serialization/deserialization architecture. Models can also be loaded directly via task APIs. The default serialization format is now safetensors, which supports arbitrary model types. The new API in kraken.models can read (kraken.models.load_models) and write model collections (kraken.models.write_safetensors). Model files are designed to contain multiple models (for example, layout + reading order), so these routines accept and return lists of models. You can mix "native" kraken implementations and plugin implementations in the same model file, such as a BLLA line segmentation and D-FINE region segmentation model. CoreML support remains, but only for legacy models from kraken 6 and earlier.

For most users: prefer safetensors, treat checkpoints as training artifacts, and distribute converted weights files.

Before (6.0.x) model loading:

# recognition
from kraken.lib.models import load_any
rec_model = load_any('recognition_model.mlmodel')

# segmentation
from kraken.lib.vgsl import TorchVGSLModel
seg_model = TorchVGSLModel.load_model('segmentation_model.mlmodel')

After (7.0) unified loading:

from kraken.models import load_models
from kraken.tasks import RecognitionTaskModel, SegmentationTaskModel

# load by task type
rec_models = load_models('model_bundle.safetensors', tasks=['recognition'])
seg_and_ro_models = load_models('model_bundle.safetensors', tasks=['segmentation', 'reading_order'])

# use via task API
recognizer = RecognitionTaskModel(rec_models)
segmenter = SegmentationTaskModel(seg_and_ro_models)

The new model stack explicitly distinguishes checkpoints from weights files. After training, checkpoints should be converted to weights. The universal conversion routine kraken.models.convert_models relies on additional entry points: a checkpoint LightningModule (or compatible class exposing load_from_checkpoint) and any configuration classes serialized into model weights. During conversion, checkpoints are loaded in weights_only mode. To support safe deserialization, kraken adds all classes registered under kraken.configs to PyTorch safe globals.

Minimal plugin registration in setup.cfg for checkpoint conversion:

[entry_points]
kraken.lightning_modules =
  MyVGSLLightningModule = mypkg.training:MyVGSLLightningModule
kraken.configs =
  MyTrainingConfig = mypkg.configs:MyTrainingConfig
kraken.models =
  MyModel = mypkg.models:MyModel

Checkpoint/weights conversion examples:

# CLI
$ ketos convert -i checkpoint_09-0.9431.ckpt -o model_best.safetensors

from kraken.models import convert_models, load_models
from kraken.models.convert import load_from_checkpoint

# checkpoint to weights
convert_models(['checkpoint_09-0.9431.ckpt'], 'model_best.safetensors')

# load lightning module from checkpoint (weights_only mode)
module = load_from_checkpoint('checkpoint_09-0.9431.ckpt')
net = module.net

# load converted weights
models = load_models('model_best.safetensors')

mittagessen/kraken 7.0.0b2 7.0 beta release on GitHub