Changed
- Improved standardization, filtering, and rule extraction pipeline robustness
with worker-serialized results, stable CGR deduplication keys, visible progress
reporting, and explicit stale-worker cleanup. - Improved policy dataset preparation with safetensors-backed cache reuse,
parallel preprocessing progress, nested result directory creation, and a
stratified split that avoids duplicate-product validation leakage. - Made optional remote logger integrations installable through extras instead
of core dependencies:SynPlanner[litlogger],SynPlanner[wandb],
SynPlanner[mlflow], orSynPlanner[loggers]. - Configured
tyrules for the dynamic chython, RDKit, PyTorch, and
NumPy typing surface while keeping unresolved-reference checks enabled. - Documented updated CLI flags, policy logger settings, GPS embedder
configuration, PR review acceptance guidelines, and new shared pipeline/cache
helper modules.
Fixed
- Rule extraction summary now always reports failed reaction counts.
- Ranking dataset cache loading now iterates safetensors keys correctly.
- Deduplication now fails fast if worker-computed dedup keys are unavailable.
- Standardization ion-splitting warnings now use the module logger.
- Reaction standardization now preserves mapped SMI source columns in
successful output rows and error reports, applies a fixed canonical chemistry
order for enabled standardizers, and excludes failed reactions from the
standardized output whenignore_errorsis enabled.