github yzhao062/pyod v3.5.0

7 hours ago

PyOD v3.5.0

Sustainable cross-sklearn-version model persistence. Closes #519.

What's new

pyod.utils.persistence is a new module with three additive helpers:

  • save(clf, path, metadata=None) writes a versioned envelope alongside the model: pyod / sklearn / numpy / scipy / joblib / python versions, a save timestamp, the model class, and an optional user metadata dict.
  • load(path, strict=False, return_metadata=False) reads the envelope, compares the recorded dependency versions against the running environment, and emits a UserWarning on drift in sklearn, joblib, numpy, or scipy. strict=True escalates warn-severity drift to ValueError. Python-version drift is severity info and never raises on the normal envelope path. return_metadata=True returns (model, envelope_without_model_field).
  • compat_load(path, mmap_mode=None) loads legacy artifacts whose sklearn Tree node dtype no longer matches the running sklearn (the recurring user pain in #519). It patches joblib's BUILD-opcode dispatch on a NumpyUnpickler subclass so saved Tree state is realigned to the running dtype before sklearn's own __setstate__ would raise.

load() falls through to compat_load() automatically when joblib.load raises the documented dtype prefix; the original exception is preserved via raise ... from. A non-prefix ValueError from joblib.load propagates without invoking compat_load.

Realignment policy

Dtype realignment is allowlist-driven:

  • _TREE_NODE_FIELD_DEFAULTS (currently {"missing_go_to_left": 0}, the pre-1.3 sklearn default) zero-fills documented missing fields.
  • _TREE_NODE_FIELD_RENAMES (empty in v3.5.0) maps known renames; rename targets are resolved before the missing-field default check, so a future rename does not also need a default entry.
  • Same-name byte-order-only differences realign safely (assignment performs the swap).
  • Any other dtype difference (unknown new field, kind change, signedness change, itemsize change, shape change) raises ValueError with a re-fit recommendation.

Current dtype is discovered dynamically from sklearn.tree._tree.NODE_DTYPE; no hardcoded layout. A single UserWarning recommending re-fit fires when at least one Tree was realigned; non-tree artifacts (ECOD, COPOD, HBOS, LOF, ...) pass through silently.

Dependency bump

joblib>=1.5 is now required because compat_load reuses joblib.numpy_pickle._validate_fileobject_and_memmap and the joblib 1.5 NumpyUnpickler(filename, file_handle, ensure_native_byte_order, mmap_mode=...) constructor; older joblib lacks both. The joblib internal imports are guarded with a clear ImportError recommending an upgrade.

Tests

31 new test cases plus 9 subtests in pyod/test/test_persistence.py covering Tree-dtype realignment (synthetic aged pickles via an _OldDtypeTree pickle-time shim), the committed sklearn 1.2.2 binary fixture under pyod/test/fixtures/iforest_sklearn_1_2_x.joblib (regenerable via regen_iforest_sklearn_1_2.py), envelope round-trip, drift warnings including the info-only python_version silent case, strict-mode rejection paths, schema-version validation including a future-version reject, the strict-after-compat no-drift case, exception chaining, the rename pattern without a paired default, and a monkey-patched joblib.load test that pins the exact-prefix fall-through gate.

CI

New persistence-nightly job in testing-cron.yml installs pre-release sklearn / numpy / scipy / joblib (scientific-python nightly index) and runs only test_persistence.py; failure surfaces upstream dtype evolution before downstream users hit it. Not a release blocker.

Docs

docs/model_persistence.rst rewritten (16 → 218 lines) with quick-start, trust-boundary, why-versioning, legacy-load decision tree, cross-sklearn-version compatibility section, troubleshooting table keyed on error text, strict-mode notes, and envelope-metadata-reading guidance. docs/pyod.utils.rst cross-references the new module. examples/save_load_model_example.py leads with persistence.save / persistence.load and notes raw joblib as a secondary alternative.

Migration

No breaking API changes. Existing joblib.dump / joblib.load workflows continue to work. For new code, prefer from pyod.utils.persistence import save, load.

Out of scope (deferred)

  • True header-only inspect_artifact(path) and pyod inspect <path> CLI (Phase 3, needs a .pyod zip container layout).
  • Deep-learning state-dict persistence (separate design).
  • Sklearn version pin tightening (separate proposal).

Review

Plan reviewed by Codex across four plan-review rounds; implementation reviewed by Codex across three execution-review rounds. All 6 findings raised over the loop were resolved before merge.

Don't miss a new pyod release

NewReleases is sending notifications on new releases.