Added
Protection Strategy Scoring (NEW MODULE)
- New
synplan/route_quality/module implementing the competing-sites scoring framework
from Westerlund et al. (ChemRxiv, 2025) FunctionalGroupDetectorwith 102 SMARTS patterns across 18 reactivity categoriesHalogenDetectorwith 140 SMARTS patterns across 5 halogen families- CGR-based
ReactionClassifierwith broad (4-category) and detailed (12-category)
reaction type classification IncompatibilityMatrixwith 3-level severity (compatible / competing / incompatible)RouteScannerfor per-step competing functional group interaction detectionCompetingSitesScorewith worst-per-step S(T) formula for route quality scoringProtectionRouteScorerintegrated directly withTreefor automatic post-search
route re-ranking based on functional group selectivityProtectionConfigdataclass with YAML serialization- Full test suite: 69 unit tests across 4 test modules
Search Algorithms
CombinedPolicyNetworkFunctionfor weighted filtering + ranking logit combination
with configurableranking_weightandtemperatureparameters- New evaluation strategies:
RDKitEvaluationStrategy,PolicyEvaluationStrategy - Stochastic mode for
RolloutSimulator(probability-weighted rule sampling) - Tree pruning via redundant expansion state caching (
enable_pruningconfig) predict_reaction_rules_light()for lightweight rollout rule prediction
Data Pipeline
RawReactionReaderfor lazy batch processing of raw SMILES/RDF strings- Distributed SMILES parsing across Ray workers (was main-thread bottleneck)
BaseStandardizerabstract class with template method patternStandardizationErrorwith safe pickling for Ray workersSTANDARDIZER_REGISTRYfor declarative standardizer configurationDuplicateReactionStandardizerwith RayDedupActorfor cluster-wide dedupDedupActorRay actor for cluster-wide unique reaction tracking- 4 new reaction filters:
MultiCenterFilter,WrongCHBreakingFilter,
CCsp3BreakingFilter,CCRingBreakingFilter ignore_errorsmode with structured TSV error files for all data pipelines- Categorized error taxonomy (
_DATA_ERROR_STAGES,_DATA_ERROR_TYPES)
distinguishing data noise from pipeline bugs parse_reaction()with format auto-detection (SMILES / RDF)load_rule_index_mapping_tsv()for new TSV rule format
Infrastructure
download_preset()for structured preset downloads from HuggingFace
(replaces deprecateddownload_all_data())- HuggingFace data moved to
Laboratoire-De-Chemoinformatique/SynPlanner-data - Preset YAML manifests (e.g.,
presets/synplanner-article.yaml) - TSV building blocks format support (
.tsv,.tsv.gz) - CUDA 12.6 and 12.8 extras (
--extra cu126,--extra cu128) - Python 3.13 and 3.14 support (
>=3.10,<3.15) - Multi-stage Docker builds with
uv sync --locked HEALTHCHECKdirective for GUI Docker image- Cross-platform CI matrix (3 OS x 4 Python versions)
uv build --wheel+uv publishfor PyPI/TestPyPI releases--ignore-errors,--error-file,--batch_sizeCLI options on all processing commandssynplan download_presetCLI command
Tutorials & Documentation
- Tutorial 00: Welcome to Chython (chython onboarding for new users)
- Tutorial 01: Coming from RDKit (migration guide with 35+ operation cheat sheet)
- Tutorial 07: Protection Scoring (end-to-end with capivasertib, 128 routes)
- Tutorial 08: Combined Ranking Filtering Policy (dual policy tuning)
- Tutorial 09: NMCS Algorithms (Nested Monte Carlo Search guide)
- API docs for
synplan.route_qualitymodule - 5 new user guide pages linked from docs index
Configs
combined_ranking_filtering_policy.yaml— combined policy network configplanning_combined_policies.yaml— planning with combined filtering + rankingplanning_value.yaml— GCN value network evaluation configrules_extraction.yaml— fine-grained atom info retention for rule extractionextraction_functional_groups.yaml— FG-aware extraction with 26 SMARTS patterns
Testing
- 80+ new unit and integration tests
test_clustering_visualization_e2e.py— 27+ tests covering full clustering pipelinetest_loading.py— building blocks loading with CSV, gzip, and TSV- SAScore benchmark suite (
scripts/sascore_bench/) with configurable YAML and plotting
Changed
Chemistry Backend Migration (BREAKING)
- ALL CGRtools imports replaced by chython equivalents across the entire codebase
chython-synplan[racer-default]>=1.93replaces bothcgrtools-stableand the
git-pinned chython fork- RDKit isolated to optional
synplan/chem/rdkit_utils.pyfor SA score calculations - Module-level
smiles_parsersingleton removed; each module importschython.smiles - Bridge functions
cgrtools_to_chython_molecule()andchython_query_to_cgrtools()
deleted
Reaction Rule Format (BREAKING)
- Rules output changed from pickle to SMARTS TSV (human-readable,
version-controllable, portable) - TSV columns:
rule_smarts,popularity,reaction_indices - Legacy pickle still loadable with automatic conversion via
_convert_cgrtools_query_container() load_reaction_rules()returnstuple(immutable, cached) instead oflist
Reactor API (BREAKING)
- Reactor constructed with explicit
patterns=,products=,delete_atoms=False - Reactants unpacked with
*reactantsinstead of passed as a list molecule_substructure_as_query()replaces CGRtools'as_query=TrueAPI
usingQueryElement.from_atom()with explicitneighbors,hydrogens,
ring_sizesflags
MCTS Architecture (BREAKING)
evaluation_functionparameter type changed fromValueNetworkFunctionto
EvaluationStrategytree.policy_networkrenamed totree.expansion_functiontree.value_networkremoved; replaced bytree.evaluatortree.building_blocksis nowfrozenset(immutable)tree.reaction_rulesis nowtuple(immutable)evaluation_typestring dispatch replaced by typed evaluation config objectsvalue_network_pathparameter removed fromrun_search(); use
evaluation_config
Data Pipeline
- Ray workers receive raw SMILES strings instead of parsed
ReactionContainerobjects extract_rules()returnstuple[list, bool]instead oflistsort_rules()returnstuple[list, dict];single_product_onlyparameter removedfilter_reaction()returns 3-tuple(bool, ReactionContainer | None, str | None)clean_atom()no longer manageshybridizationattributedepict_settingsis now a module-level function, not a class method
Dependencies
cgrtools-stable==4.2.13removedchythongit pin replaced bychython-synplan[racer-default]>=1.93chytorch-synplan>=1.70(was>=1.69)chytorch-rxnmap-synplan>=1.7(was>=1.6)rdkit>=2023.9.1(relaxed from>2025.3.5)- CUDA extras:
--extra cudareplaced by--extra cu126/--extra cu128
Other
download_all_data()deprecated in favor ofdownload_preset()- Type annotations modernized:
Dict,List,Union->dict,list,| tqdm->tqdm.autofor notebook compatibility- All existing tutorials (Steps 2-6) rewritten for chython-synplan
Fixed
- Product validation now copies molecule before
kekule()to prevent mutation RankingPolicyDataset:if rule_id:->if rule_id is None:(was silently
skipping rule index 0)- Variable-shadowing bug in
_expand_node(for new_precursor in new_precursor) InvalidAromaticRingexception now caught alongsideKeyErrorandIndexError- Reactor no longer deletes atoms by default (
delete_atoms=False) - Windows path handling
- CUDA/PyTorch resolution in CI
- GUI and CI fixes
- Visualisation bugs
Breaking Changes Summary
Data & Reproducibility: All pretrained models, reaction rules (pickle format),
and building block files from previous versions produce different results with
v1.4.0. Users must:
- Re-extract reaction rules (now saved as SMARTS TSV)
- Retrain all policy and value networks
- Re-standardize building blocks
The root cause is that chython produces different canonical SMILES, different atom
feature vectors, different Kekulization, and different reaction products compared to
CGRtools. While the 11-dimensional atom feature schema is unchanged, the underlying
values differ for aromaticity perception, ring detection, and hydrogen counting.
| Breaking Change | Migration Path |
|---|---|
| CGRtools imports | Replace with chython equivalents
|
| Pickle reaction rules | Re-extract rules (outputs SMARTS TSV) or load legacy pickle (auto-converted) |
ValueNetworkFunction as Tree arg
| Use EvaluationStrategy subclass
|
evaluation_type string config
| Use typed config objects (ValueNetworkEvaluationConfig, etc.)
|
tree.policy_network
| Use tree.expansion_function
|
tree.value_network
| Use tree.evaluator
|
tree.building_blocks mutation
| Filter before Tree init (frozenset)
|
value_network_path in run_search()
| Use evaluation_config parameter
|
--extra cuda
| Use --extra cu126 or --extra cu128
|
download_all_data()
| Use download_preset()
|
| Pretrained models | Retrain — feature vectors differ |
| HuggingFace repo | Data moved to SynPlanner-data repo
|