Changed
Parallelization
- Removed Ray dependency entirely — all parallel pipelines now use
ProcessPoolExecutorvia the newprocess_pool_map_streamutility process_pool_map_streamenhanced withorderedmode (submission-order
yield), per-futuretimeout,initializer/initargsfor non-picklable
worker state,max_tasks_per_child(Python 3.11+), andon_timeoutcallback- New
graceful_shutdown()context manager for SIGTERM/SIGINT handling with
automatic signal handler restoration
Data Pipeline
- Standardization, filtering, rule extraction, and ML preprocessing pipelines
migrated from Ray toprocess_pool_map_streamwith initializer pattern - Writer-side CGR dedup:
hash(~rxn)(condensed graph of reaction hash) for
mechanism-level reaction deduplication — 8 bytes per entry in memory - New shared result types:
ProcessResult,ErrorEntry,FilteredEntry,
PipelineSummaryinsynplan.chem.data.reaction_result
Compatibility
- Removed
from __future__ import annotationsfrom all modules (Dagster
compatibility) - Forward references quoted for self-referencing return types
Removed
raydependency removed frompyproject.tomlinit_ray_logging()removed fromsynplan.utils.loggingDedupActorRay actor removed
Added
- 10 unit tests for
process_pool_map_streamandgraceful_shutdown
(tests/unit/utils/test_parallel.py) - 8 unit tests for
ProcessResult,PipelineSummary, and CGR dedup
(tests/unit/chem/data/test_pipeline.py)