PyOD v3.5.0
Sustainable cross-sklearn-version model persistence. Closes #519.
What's new
pyod.utils.persistence is a new module with three additive helpers:
save(clf, path, metadata=None)writes a versioned envelope alongside the model: pyod / sklearn / numpy / scipy / joblib / python versions, a save timestamp, the model class, and an optional user metadata dict.load(path, strict=False, return_metadata=False)reads the envelope, compares the recorded dependency versions against the running environment, and emits aUserWarningon drift in sklearn, joblib, numpy, or scipy.strict=Trueescalates warn-severity drift toValueError. Python-version drift is severityinfoand never raises on the normal envelope path.return_metadata=Truereturns(model, envelope_without_model_field).compat_load(path, mmap_mode=None)loads legacy artifacts whose sklearnTreenode dtype no longer matches the running sklearn (the recurring user pain in #519). It patches joblib's BUILD-opcode dispatch on aNumpyUnpicklersubclass so saved Tree state is realigned to the running dtype before sklearn's own__setstate__would raise.
load() falls through to compat_load() automatically when joblib.load raises the documented dtype prefix; the original exception is preserved via raise ... from. A non-prefix ValueError from joblib.load propagates without invoking compat_load.
Realignment policy
Dtype realignment is allowlist-driven:
_TREE_NODE_FIELD_DEFAULTS(currently{"missing_go_to_left": 0}, the pre-1.3 sklearn default) zero-fills documented missing fields._TREE_NODE_FIELD_RENAMES(empty in v3.5.0) maps known renames; rename targets are resolved before the missing-field default check, so a future rename does not also need a default entry.- Same-name byte-order-only differences realign safely (assignment performs the swap).
- Any other dtype difference (unknown new field, kind change, signedness change, itemsize change, shape change) raises
ValueErrorwith a re-fit recommendation.
Current dtype is discovered dynamically from sklearn.tree._tree.NODE_DTYPE; no hardcoded layout. A single UserWarning recommending re-fit fires when at least one Tree was realigned; non-tree artifacts (ECOD, COPOD, HBOS, LOF, ...) pass through silently.
Dependency bump
joblib>=1.5 is now required because compat_load reuses joblib.numpy_pickle._validate_fileobject_and_memmap and the joblib 1.5 NumpyUnpickler(filename, file_handle, ensure_native_byte_order, mmap_mode=...) constructor; older joblib lacks both. The joblib internal imports are guarded with a clear ImportError recommending an upgrade.
Tests
31 new test cases plus 9 subtests in pyod/test/test_persistence.py covering Tree-dtype realignment (synthetic aged pickles via an _OldDtypeTree pickle-time shim), the committed sklearn 1.2.2 binary fixture under pyod/test/fixtures/iforest_sklearn_1_2_x.joblib (regenerable via regen_iforest_sklearn_1_2.py), envelope round-trip, drift warnings including the info-only python_version silent case, strict-mode rejection paths, schema-version validation including a future-version reject, the strict-after-compat no-drift case, exception chaining, the rename pattern without a paired default, and a monkey-patched joblib.load test that pins the exact-prefix fall-through gate.
CI
New persistence-nightly job in testing-cron.yml installs pre-release sklearn / numpy / scipy / joblib (scientific-python nightly index) and runs only test_persistence.py; failure surfaces upstream dtype evolution before downstream users hit it. Not a release blocker.
Docs
docs/model_persistence.rst rewritten (16 → 218 lines) with quick-start, trust-boundary, why-versioning, legacy-load decision tree, cross-sklearn-version compatibility section, troubleshooting table keyed on error text, strict-mode notes, and envelope-metadata-reading guidance. docs/pyod.utils.rst cross-references the new module. examples/save_load_model_example.py leads with persistence.save / persistence.load and notes raw joblib as a secondary alternative.
Migration
No breaking API changes. Existing joblib.dump / joblib.load workflows continue to work. For new code, prefer from pyod.utils.persistence import save, load.
Out of scope (deferred)
- True header-only
inspect_artifact(path)andpyod inspect <path>CLI (Phase 3, needs a.pyodzip container layout). - Deep-learning state-dict persistence (separate design).
- Sklearn version pin tightening (separate proposal).
Review
Plan reviewed by Codex across four plan-review rounds; implementation reviewed by Codex across three execution-review rounds. All 6 findings raised over the loop were resolved before merge.