kraken 7.0 introduces major changes to training, inference, model handling, and extensibility.
If you are upgrading from 6.0.x as an average user, start with Breaking Changes and Command Line Behavior.
Installing the Beta
Install the latest available 7.0 pre-release from PyPI:
$ pip install --upgrade --pre krakenInstall this specific beta explicitly:
$ pip install --upgrade "kraken==7.0b1"Breaking Changes
- Python 3.9 support was dropped. kraken now supports Python 3.10 through 3.13.
- Device and precision options are now global on both
krakenandketoscommands. - Training and evaluation manifest option names changed from
--training-files/--evaluation-filesto--training-data/--evaluation-data. ketos train,ketos segtrain,ketos rotrain, andketos pretrainnow produce checkpoints and convert the best checkpoint to a weights file after training.- Segmentation training class filtering/merging CLI options were removed. Class mapping is now defined in YAML experiment files.
ketos segtestmetrics are now computed against a configurable class mapping, and baseline detection metrics replace the older, less informative pixel accuracy/IoU-only view.ketos compilefixed splits were removed due to a significant performance penalty. Use separate dataset files per split instead.- The API for both training and inference has been reworked extensively.
safetensorsis now the default output format for trained weights.- Neural reading order models are only executed when using the new task API.
- Recognition and segmentation inference accelerators now default to
auto, selecting the highest-performance available device.
In practice: most existing workflows keep working after small updates, but training artifacts and API entry points changed enough that scripted pipelines and API-use will need adaptation.
Bug Fixes
- Fixed a breaking bug in reading order models that prevented trained model weights from loading.
Features and Improvements
- A plugin system now allows easy extension of kraken functionality with new segmentation, recognition, and reading order implementations.
- Persistent configuration through experiment YAML files has been added to
ketos. - The new recognition API supports batching plus parallelized line extraction/processing, enabling effective GPU inference. Speedups of around 80% were observed on CPU, with even larger gains with GPU acceleration.
- Character cuts on
BaselineOCRRecordare now computed at initialization using a more efficient algorithm. This substantially reduces serialization overhead in the default--subline-segmentationmode. - Baseline detection metrics inspired by the Transkribus Evaluation Scheme are now computed during segmentation training. Unlike older pixel-based metrics, these scores correlate more directly with actual line detection quality.
- The XML parser has been reworked for better robustness against invalid input. When PageXML files contain invalid image dimensions, kraken now attempts to read dimensions from the referenced image file. Reading-order parsing was also fully reimplemented to handle partial explicit orders and multi-level ordering more gracefully.
Plugins
kraken can now use external implementations of layout analysis, text recognition, and reading order determination through Python entry points.
Plugins are distributed as regular Python packages. After installation, kraken discovers them automatically through entry points. Plugin model files are then used exactly like native kraken model files: pass them to --model on the CLI or load them via task classes in Python.
Example workflow with a D-FINE layout analysis plugin model:
# install plugin package
$ pip install git+https://github.com/mittagessen/dfine_kraken.git
# run layout analysis with a plugin model file
$ kraken -i page.tif page.json segment --baseline --model dfine_layout.safetensorsThe same model can be loaded programmatically with SegmentationTaskModel.load_model('dfine_layout.safetensors').
Command Line Behavior
Inference
Device and precision are now global options on kraken.
Set them before subcommands:
# CPU inference in full precision
$ kraken -i page.tif page.txt --device cpu --precision 32-true \
segment -bl ocr -m model.safetensors
# GPU inference with mixed bfloat16 precision
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
segment -bl ocr -m model.safetensorsRecognition now exposes two throughput controls:
-B/--batch-size: number of extracted line images sent per recognition forward pass.--num-line-workers: number of CPU worker processes used to extract/preprocess line images. Use0to keep extraction in-process.
# conservative settings for small GPUs or CPU-only runs
$ kraken -i page.tif page.txt segment -bl ocr -m model.safetensors \
-B 8 --num-line-workers 2
# higher-throughput GPU settings
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8Training
Experiment Files
Managing non-trivial training configurations from CLI flags alone was difficult, especially when heavily modifying segmentation class taxonomies. To address this, ketos now supports YAML experiment files.
Pass an experiment file with --config before the command name:
$ ketos --config experiments.yml segtrainYAML keys correspond to the internal parameter names used by the CLI.
Minimal segmentation training experiment file:
precision: 32-true
device: auto
num_workers: 16
num_threads: 1
segtrain:
training_data:
- seg_train.lst
evaluation_data:
- seg_val.lst
format_type: xml
checkpoint_path: seg_checkpoints
weights_format: safetensors
line_class_mapping:
- ['*', 3]
- ['DefaultLine', 3]Single experiment file containing multiple commands:
precision: 32-true
device: auto
num_workers: 16
num_threads: 1
train:
training_data:
- rec_train.lst
evaluation_data:
- rec_val.lst
format_type: xml
checkpoint_path: rec_checkpoints
weights_format: safetensors
segtrain:
training_data:
- seg_train.lst
evaluation_data:
- seg_val.lst
format_type: xml
checkpoint_path: seg_checkpoints
weights_format: safetensorsConfigurations for multiple commands can be saved in the same experiment file.
Recommendation: move non-trivial setups (class mappings, optimizer/scheduler settings, hardware defaults) into YAML so runs are reproducible and easier to review.
Training Outputs, Checkpoints, and Weights
For ketos train, ketos segtrain, and ketos rotrain, training now produces Lightning checkpoints (.ckpt) as the primary artifact instead of writing CoreML weights directly during training.
Checkpoint files include full training state (model weights, optimizer state, scheduler state, epoch/step counters, and serialized training config), enabling exact continuation of interrupted runs.
There are now two distinct continuation modes:
--resumerestores and continues from the checkpoint's exact previous training state. The checkpoint state is authoritative, even if command-line flags or config files specify different values.--loadkeeps the previous fine-tune/start-new-run behavior. It loads weights only and starts a fresh run using current CLI/config hyperparameters.
Use --resume when you want to continue the same run.
Use --load when you want to start a new run from existing weights.
In addition to regular checkpoints, kraken now writes an emergency abort checkpoint by default (checkpoint_abort.ckpt) when a training run exits via exception (for example, a crash or a forceful abort). This gives you a recovery point even when a run terminates unexpectedly.
Because checkpoints contain much more than deployable model weights and may execute arbitrary python code on load, distribute converted weights files rather than raw checkpoints. Conversion strips training-only state and produces a distribution-safe weights artifact.
At the end of training, kraken automatically converts the best checkpoint into a weights file. You can also convert manually with ketos convert.
The default weights format is now safetensors. Compared to legacy coreml weights, safetensors supports serialization of arbitrary model types, while coreml is limited to core model methods implemented in kraken.
Use --weights-format coreml only when you explicitly need legacy compatibility.
Testing
Segmentation test output now includes metrics computed on vectorized baselines that correlate with segmentation quality, making model selection for line detection much easier. segtest behavior also changed with the checkpoint/weights distinction. In previous releases, test data often had to mirror post-merge/post-filter training mappings, which made evaluation cumbersome without rewriting source labels.
In short: you can now evaluate more datasets directly, with less taxonomy rewriting.
Example segtest invocation:
$ ketos --device cpu segtest -m best_0.9471.safetensors -e test_manifest.lst -f xmlExample output excerpt:
Category Class Name Pixel Accuracy IOU Object Count
aux _start_sep 1.000 1.000 N/A
aux _end_sep 1.000 1.000 N/A
regions Text_Region 0.992 0.964 184
regions Foot_Notes 0.973 0.887 36
Class Precision Recall F1
Overall 0.947 0.933 0.940
DefaultLine 0.959 0.946 0.952
Marginalia 0.891 0.874 0.882
Class mappings are now stored in two forms in checkpoints and new weights files:
- A full mapping with all transformations (merges/filtering) from training taxonomy to model outputs.
- A canonical one-to-one mapping between label indices and class strings.
By default, evaluation uses the full mapping. Canonical mapping is used when explicitly requested and as a fallback for pre-7.0 model files. Fully custom mappings can also be defined in an experiment file.
Class mapping modes in ketos segtest:
# Use full (many-to-one) training mapping from checkpoint metadata
$ ketos segtest -m model.ckpt -e test.lst --test-class-mapping-mode full
# Use canonical one-to-one model output mapping
$ ketos segtest -m model.safetensors -e test.lst --test-class-mapping-mode canonical
# Provide explicit mapping for the test set taxonomy
$ ketos --config segtest_custom.yml segtest -m model.safetensors -e test.lst \
--test-class-mapping-mode custom# segtest_custom.yml
segtest:
line_class_mapping:
- ['DefaultLine', 3]
- ['Running_Title', 3]
- ['Marginal_Note', 4]
region_class_mapping:
- ['Text_Region', 5]
- ['Foot_Notes', 6]For easier debugging, ketos segtest now prints an explicit mapping between test-set classes and model classes, including clear indicators for merges, missing labels, and conflicts.
Example class taxonomy diagnostics table:
Class Mapping Diagnostics (model=full, dataset=effective)
Category Class Name Model Idx Dataset Idx Observed Effective Status
baselines DefaultLine 3 3 812 812 ok
baselines Running_Title 3 3 57 57 ok
baselines Rubrication 4 - 14 0 ignored by dataset mapping
regions Text_Region 5 5 184 184 ok
regions Illustration - 7 22 22 missing in model mapping
API
Configuration Classes
In previous versions of kraken, training and inference hyperparameters were defined in dictionaries in the default_specs module. This was error-prone and resulted in verbose code in the command line drivers.
If you maintain Python training/inference scripts, migrate to typed config classes for better defaults, clearer parameter names, and safer checkpoint serialization.
Before (6.0.x) using default_specs dictionaries:
from kraken.lib.default_specs import RECOGNITION_HYPER_PARAMS
from kraken.lib.train import RecognitionModel
hyper_params = RECOGNITION_HYPER_PARAMS.copy()
hyper_params.update({'batch_size': 8, 'lrate': 1e-3})
model = RecognitionModel(hyper_params=hyper_params, training_data=['train.lst'])After (7.0) using typed configuration classes:
from kraken.configs import (RecognitionInferenceConfig,
VGSLRecognitionTrainingConfig,
VGSLRecognitionTrainingDataConfig)
infer_cfg = RecognitionInferenceConfig(batch_size=8,
num_line_workers=4,
precision='bf16-mixed')
train_cfg = VGSLRecognitionTrainingConfig(lrate=1e-3,
quit='early',
epochs=24)
data_cfg = VGSLRecognitionTrainingDataConfig(training_data=['train.lst'],
evaluation_data=['val.lst'],
format_type='xml')Task-based API for Inference
blla.segment(), align.forced_align(), and rpred.rpred()/rpred.mm_rpred() have been replaced by implementation-agnostic task classes that provide better performance and flexibility. The largest gains are in text recognition, where CPU inference improves by roughly 80% through parallelization. Batching additionally enables efficient GPU utilization.
If you call legacy APIs directly, plan a migration to kraken.tasks soon. Legacy interfaces remain available for now but are deprecated.
To migrate an existing segmentation workflow, replace:
from PIL import Image
from kraken.blla import segment
from kraken.lib.vgsl import TorchVGSLModel
model = TorchVGSLModel.load_model('/path/to/segmentation/model.coreml')
im = Image.open('sample.jpg')
seg = segment(im, model=model)with:
from PIL import Image
from kraken.tasks import SegmentationTaskModel
from kraken.configs import SegmentationInferenceConfig
segmenter = SegmentationTaskModel.load_model('/path/to/segmentation/models.safetensors')
im = Image.open('sample.jpg')
seg = segmenter.predict(im=im, config=SegmentationInferenceConfig())For recognition. Before:
from PIL import Image
from kraken.rpred import rpred
from kraken.lib.models import load_any
net = load_any('/path/to/recognition/model.mlmodel')
for record in rpred(net, im, segmentation=seg):
print(record)After:
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig
recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
for record in recognizer.predict(im=im, segmentation=seg, config=RecognitionInferenceConfig(batch_size=8, num_line_workers=4)):
print(record)Recognition now supports batching (batch_size in RecognitionInferenceConfig) and parallel line extraction (num_line_workers), making GPU acceleration practical.
CUDA example with explicit accelerator/device settings:
from PIL import Image
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig
recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
config = RecognitionInferenceConfig(accelerator='gpu',
device=[0],
precision='bf16-mixed',
batch_size=64,
num_line_workers=8)
for record in recognizer.predict(im=Image.open('page.tif'), segmentation=seg, config=config):
print(record.prediction)The new recognition API does not support tag-based multi-model recognition (rpred.mm_rpred()), which was dropped to simplify batched inference.
For forced alignment. Before:
from PIL import Image
from kraken.containers import Segmentation, BaselineLine
from kraken.align import forced_align
from kraken.lib.models import load_any
model = load_any('model.mlmodel')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(imagename='image.png', lines=[line])
aligned_segmentation = forced_align(segmentation, model)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)After:
from PIL import Image
from kraken.tasks import ForcedAlignmentTaskModel
from kraken.containers import Segmentation, BaselineLine
from kraken.configs import RecognitionInferenceConfig
# Assume `model.mlmodel` is a recognition model
model = ForcedAlignmentTaskModel.load_model('model.mlmodel')
im = Image.open('image.png')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(lines=[line])
config = RecognitionInferenceConfig()
aligned_segmentation = model.predict(im, segmentation, config)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)The old interfaces remain available but are deprecated and will be removed in kraken 8.
Training Refactor
The training module has been moved from kraken.lib.train to kraken.train (with reading order and pretraining modules in kraken.lib.ro/kraken.lib.pretrain). Training now uses explicit configuration objects and consistently uses LightningDataModule-derived classes.
If you run training programmatically, update imports and constructors and switch hyperparameter dicts to config objects.
Before (6.0.x) style instantiation:
from kraken.lib.train import RecognitionModel, SegmentationModel
from kraken.lib.pretrain.model import RecognitionPretrainModel
from kraken.lib.ro.model import RODataModule, ROModel
rec = RecognitionModel(hyper_params={'batch_size': 8},
training_data=['train.lst'],
evaluation_data=['val.lst'])
seg = SegmentationModel(hyper_params={'epochs': 50},
training_data=['seg_train.lst'],
evaluation_data=['seg_val.lst'])
pre = RecognitionPretrainModel(hyper_params={'mask_prob': 0.5})
ro_dm = RODataModule(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'])
ro = ROModel(feature_dim=128, class_mapping={'default': 1}, hyper_params={'epochs': 3000})After (7.0) style instantiation:
from kraken.train import (KrakenTrainer,
VGSLRecognitionDataModule, VGSLRecognitionModel,
BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.lib.pretrain import PretrainDataModule, RecognitionPretrainModel
from kraken.lib.ro import RODataModule, ROModel
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
BLLASegmentationTrainingConfig, BLLASegmentationTrainingDataConfig,
VGSLPreTrainingConfig, VGSLPreTrainingDataConfig,
ROTrainingConfig, ROTrainingDataConfig)
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(training_data=['train.lst'], evaluation_data=['val.lst'], format_type='xml'))
rec_model = VGSLRecognitionModel(VGSLRecognitionTrainingConfig(epochs=24, quit='early'))
seg_dm = BLLASegmentationDataModule(BLLASegmentationTrainingDataConfig(training_data=['seg_train.lst'], evaluation_data=['seg_val.lst'], format_type='xml'))
seg_model = BLLASegmentationModel(BLLASegmentationTrainingConfig(epochs=50, quit='fixed'))
pre_dm = PretrainDataModule(VGSLPreTrainingDataConfig(training_data=['pretrain_train.lst'], evaluation_data=['pretrain_val.lst'], format_type='path'))
pre_model = RecognitionPretrainModel(VGSLPreTrainingConfig(mask_prob=0.5))
ro_dm = RODataModule(ROTrainingDataConfig(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'], format_type='xml', level='baselines'))
ro_model = ROModel(ROTrainingConfig(epochs=3000, quit='early'))The KrakenTrainer module works as before.
In addition, separate test routines are now integrated into Lightning modules, allowing straightforward programmatic execution of the test loop for segmentation and recognition.
Example: programmatic test loop execution with KrakenTrainer.test():
KrakenTrainer.test() returns typed metric containers:
- Recognition (
RecognitionTestMetrics):character_counts,num_errors,cer,wer,case_insensitive_cer,confusions,scripts,insertions,deletes,substitutions - Segmentation (
SegmentationTestMetrics):class_pixel_accuracy,mean_accuracy,class_iu,mean_iu,freq_iu,region_iu,bl_precision,bl_recall,bl_f1,bl_detection_per_class
from kraken.train import (KrakenTrainer,
VGSLRecognitionDataModule, VGSLRecognitionModel,
BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
BLLASegmentationTrainingConfig, BLLASegmentationTestDataConfig)
trainer = KrakenTrainer(accelerator='cpu', devices=1, precision='32-true')
rec_model = VGSLRecognitionModel.load_from_weights('rec_best.safetensors',
VGSLRecognitionTrainingConfig())
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(test_data=['rec_test.lst'], format_type='xml'))
rec_metrics = trainer.test(rec_model, rec_dm)
seg_model = BLLASegmentationModel.load_from_weights('seg_best.safetensors',
BLLASegmentationTrainingConfig())
seg_dm = BLLASegmentationDataModule(BLLASegmentationTestDataConfig(test_data=['seg_test.lst'],
format_type='xml',
test_class_mapping_mode='canonical'))
seg_metrics = trainer.test(seg_model, seg_dm)Plugin System and Model Base Classes
kraken now supports alternative segmentation and recognition implementations through a plugin system based on Python entry points. To be compatible, plugins must implement the interfaces defined by the abstract kraken.models.BaseModel class. kraken.models.SegmentationBaseModel and kraken.models.RecognitionBaseModel provide task-specific base interfaces.
This is primarily relevant if you are extending kraken with custom model types or distributing third-party integrations.
Rough implementation skeletons:
from torch import nn
from kraken.models import BaseModel, SegmentationBaseModel, RecognitionBaseModel
class MySegmentationModel(nn.Module, SegmentationBaseModel):
_kraken_min_version = '7.0.0'
model_type = ['segmentation']
def prepare_for_inference(self, config): self.eval()
def predict(self, im): ...
class MyRecognitionModel(nn.Module, RecognitionBaseModel):
_kraken_min_version = '7.0.0'
model_type = ['recognition']
def prepare_for_inference(self, config): self.eval()
def predict(self, im, segmentation): ...To be discoverable by kraken, these classes must be registered as entry points in your setup.cfg or similar under the kraken.models group with their class name:
Example from kraken's own setup.cfg:
[entry_points]
kraken.models =
TorchVGSLModel = kraken.lib.vgsl:TorchVGSLModel
Wav2Vec2Mask = kraken.lib.pretrain:Wav2Vec2Mask
ROMLP = kraken.lib.ro:ROMLPThere is an example plugin in D-FINE kraken incorporating the D-FINE object detector for layout analysis.
Model Handling
kraken replaced type-specific model loaders with a modular serialization/deserialization architecture. Models can also be loaded directly via task APIs. The default serialization format is now safetensors, which supports arbitrary model types. The new API in kraken.models can read (kraken.models.load_models) and write model collections (kraken.models.write_safetensors). Model files are designed to contain multiple models (for example, layout + reading order), so these routines accept and return lists of models. You can mix "native" kraken implementations and plugin implementations in the same model file, such as a BLLA line segmentation and D-FINE region segmentation model. CoreML support remains, but only for legacy models from kraken 6 and earlier.
For most users: prefer safetensors, treat checkpoints as training artifacts, and distribute converted weights files.
Before (6.0.x) model loading:
# recognition
from kraken.lib.models import load_any
rec_model = load_any('recognition_model.mlmodel')
# segmentation
from kraken.lib.vgsl import TorchVGSLModel
seg_model = TorchVGSLModel.load_model('segmentation_model.mlmodel')After (7.0) unified loading:
from kraken.models import load_models
from kraken.tasks import RecognitionTaskModel, SegmentationTaskModel
# load by task type
rec_models = load_models('model_bundle.safetensors', tasks=['recognition'])
seg_and_ro_models = load_models('model_bundle.safetensors', tasks=['segmentation', 'reading_order'])
# use via task API
recognizer = RecognitionTaskModel(rec_models)
segmenter = SegmentationTaskModel(seg_and_ro_models)The new model stack explicitly distinguishes checkpoints from weights files. After training, checkpoints should be converted to weights. The universal conversion routine kraken.models.convert_models relies on additional entry points: a checkpoint LightningModule (or compatible class exposing load_from_checkpoint) and any configuration classes serialized into model weights. During conversion, checkpoints are loaded in weights_only mode. To support safe deserialization, kraken adds all classes registered under kraken.configs to PyTorch safe globals.
Minimal plugin registration in setup.cfg for checkpoint conversion:
[entry_points]
kraken.lightning_modules =
MyVGSLLightningModule = mypkg.training:MyVGSLLightningModule
kraken.configs =
MyTrainingConfig = mypkg.configs:MyTrainingConfig
kraken.models =
MyModel = mypkg.models:MyModelCheckpoint/weights conversion examples:
# CLI
$ ketos convert -i checkpoint_09-0.9431.ckpt -o model_best.safetensorsfrom kraken.models import convert_models, load_models
from kraken.models.convert import load_from_checkpoint
# checkpoint to weights
convert_models(['checkpoint_09-0.9431.ckpt'], 'model_best.safetensors')
# load lightning module from checkpoint (weights_only mode)
module = load_from_checkpoint('checkpoint_09-0.9431.ckpt')
net = module.net
# load converted weights
models = load_models('model_best.safetensors')