github Laboratoire-de-Chemoinformatique/SynPlanner v1.4.0

latest release: v1.4.1
8 hours ago

Added

Protection Strategy Scoring (NEW MODULE)

  • New synplan/route_quality/ module implementing the competing-sites scoring framework
    from Westerlund et al. (ChemRxiv, 2025)
  • FunctionalGroupDetector with 102 SMARTS patterns across 18 reactivity categories
  • HalogenDetector with 140 SMARTS patterns across 5 halogen families
  • CGR-based ReactionClassifier with broad (4-category) and detailed (12-category)
    reaction type classification
  • IncompatibilityMatrix with 3-level severity (compatible / competing / incompatible)
  • RouteScanner for per-step competing functional group interaction detection
  • CompetingSitesScore with worst-per-step S(T) formula for route quality scoring
  • ProtectionRouteScorer integrated directly with Tree for automatic post-search
    route re-ranking based on functional group selectivity
  • ProtectionConfig dataclass with YAML serialization
  • Full test suite: 69 unit tests across 4 test modules

Search Algorithms

  • CombinedPolicyNetworkFunction for weighted filtering + ranking logit combination
    with configurable ranking_weight and temperature parameters
  • New evaluation strategies: RDKitEvaluationStrategy, PolicyEvaluationStrategy
  • Stochastic mode for RolloutSimulator (probability-weighted rule sampling)
  • Tree pruning via redundant expansion state caching (enable_pruning config)
  • predict_reaction_rules_light() for lightweight rollout rule prediction

Data Pipeline

  • RawReactionReader for lazy batch processing of raw SMILES/RDF strings
  • Distributed SMILES parsing across Ray workers (was main-thread bottleneck)
  • BaseStandardizer abstract class with template method pattern
  • StandardizationError with safe pickling for Ray workers
  • STANDARDIZER_REGISTRY for declarative standardizer configuration
  • DuplicateReactionStandardizer with Ray DedupActor for cluster-wide dedup
  • DedupActor Ray actor for cluster-wide unique reaction tracking
  • 4 new reaction filters: MultiCenterFilter, WrongCHBreakingFilter,
    CCsp3BreakingFilter, CCRingBreakingFilter
  • ignore_errors mode with structured TSV error files for all data pipelines
  • Categorized error taxonomy (_DATA_ERROR_STAGES, _DATA_ERROR_TYPES)
    distinguishing data noise from pipeline bugs
  • parse_reaction() with format auto-detection (SMILES / RDF)
  • load_rule_index_mapping_tsv() for new TSV rule format

Infrastructure

  • download_preset() for structured preset downloads from HuggingFace
    (replaces deprecated download_all_data())
  • HuggingFace data moved to Laboratoire-De-Chemoinformatique/SynPlanner-data
  • Preset YAML manifests (e.g., presets/synplanner-article.yaml)
  • TSV building blocks format support (.tsv, .tsv.gz)
  • CUDA 12.6 and 12.8 extras (--extra cu126, --extra cu128)
  • Python 3.13 and 3.14 support (>=3.10,<3.15)
  • Multi-stage Docker builds with uv sync --locked
  • HEALTHCHECK directive for GUI Docker image
  • Cross-platform CI matrix (3 OS x 4 Python versions)
  • uv build --wheel + uv publish for PyPI/TestPyPI releases
  • --ignore-errors, --error-file, --batch_size CLI options on all processing commands
  • synplan download_preset CLI command

Tutorials & Documentation

  • Tutorial 00: Welcome to Chython (chython onboarding for new users)
  • Tutorial 01: Coming from RDKit (migration guide with 35+ operation cheat sheet)
  • Tutorial 07: Protection Scoring (end-to-end with capivasertib, 128 routes)
  • Tutorial 08: Combined Ranking Filtering Policy (dual policy tuning)
  • Tutorial 09: NMCS Algorithms (Nested Monte Carlo Search guide)
  • API docs for synplan.route_quality module
  • 5 new user guide pages linked from docs index

Configs

  • combined_ranking_filtering_policy.yaml — combined policy network config
  • planning_combined_policies.yaml — planning with combined filtering + ranking
  • planning_value.yaml — GCN value network evaluation config
  • rules_extraction.yaml — fine-grained atom info retention for rule extraction
  • extraction_functional_groups.yaml — FG-aware extraction with 26 SMARTS patterns

Testing

  • 80+ new unit and integration tests
  • test_clustering_visualization_e2e.py — 27+ tests covering full clustering pipeline
  • test_loading.py — building blocks loading with CSV, gzip, and TSV
  • SAScore benchmark suite (scripts/sascore_bench/) with configurable YAML and plotting

Changed

Chemistry Backend Migration (BREAKING)

  • ALL CGRtools imports replaced by chython equivalents across the entire codebase
  • chython-synplan[racer-default]>=1.93 replaces both cgrtools-stable and the
    git-pinned chython fork
  • RDKit isolated to optional synplan/chem/rdkit_utils.py for SA score calculations
  • Module-level smiles_parser singleton removed; each module imports chython.smiles
  • Bridge functions cgrtools_to_chython_molecule() and chython_query_to_cgrtools()
    deleted

Reaction Rule Format (BREAKING)

  • Rules output changed from pickle to SMARTS TSV (human-readable,
    version-controllable, portable)
  • TSV columns: rule_smarts, popularity, reaction_indices
  • Legacy pickle still loadable with automatic conversion via
    _convert_cgrtools_query_container()
  • load_reaction_rules() returns tuple (immutable, cached) instead of list

Reactor API (BREAKING)

  • Reactor constructed with explicit patterns=, products=, delete_atoms=False
  • Reactants unpacked with *reactants instead of passed as a list
  • molecule_substructure_as_query() replaces CGRtools' as_query=True API
    using QueryElement.from_atom() with explicit neighbors, hydrogens,
    ring_sizes flags

MCTS Architecture (BREAKING)

  • evaluation_function parameter type changed from ValueNetworkFunction to
    EvaluationStrategy
  • tree.policy_network renamed to tree.expansion_function
  • tree.value_network removed; replaced by tree.evaluator
  • tree.building_blocks is now frozenset (immutable)
  • tree.reaction_rules is now tuple (immutable)
  • evaluation_type string dispatch replaced by typed evaluation config objects
  • value_network_path parameter removed from run_search(); use
    evaluation_config

Data Pipeline

  • Ray workers receive raw SMILES strings instead of parsed ReactionContainer objects
  • extract_rules() returns tuple[list, bool] instead of list
  • sort_rules() returns tuple[list, dict]; single_product_only parameter removed
  • filter_reaction() returns 3-tuple (bool, ReactionContainer | None, str | None)
  • clean_atom() no longer manages hybridization attribute
  • depict_settings is now a module-level function, not a class method

Dependencies

  • cgrtools-stable==4.2.13 removed
  • chython git pin replaced by chython-synplan[racer-default]>=1.93
  • chytorch-synplan>=1.70 (was >=1.69)
  • chytorch-rxnmap-synplan>=1.7 (was >=1.6)
  • rdkit>=2023.9.1 (relaxed from >2025.3.5)
  • CUDA extras: --extra cuda replaced by --extra cu126 / --extra cu128

Other

  • download_all_data() deprecated in favor of download_preset()
  • Type annotations modernized: Dict, List, Union -> dict, list, |
  • tqdm -> tqdm.auto for notebook compatibility
  • All existing tutorials (Steps 2-6) rewritten for chython-synplan

Fixed

  • Product validation now copies molecule before kekule() to prevent mutation
  • RankingPolicyDataset: if rule_id: -> if rule_id is None: (was silently
    skipping rule index 0)
  • Variable-shadowing bug in _expand_node (for new_precursor in new_precursor)
  • InvalidAromaticRing exception now caught alongside KeyError and IndexError
  • Reactor no longer deletes atoms by default (delete_atoms=False)
  • Windows path handling
  • CUDA/PyTorch resolution in CI
  • GUI and CI fixes
  • Visualisation bugs

Breaking Changes Summary

Data & Reproducibility: All pretrained models, reaction rules (pickle format),
and building block files from previous versions produce different results with
v1.4.0. Users must:

  1. Re-extract reaction rules (now saved as SMARTS TSV)
  2. Retrain all policy and value networks
  3. Re-standardize building blocks

The root cause is that chython produces different canonical SMILES, different atom
feature vectors, different Kekulization, and different reaction products compared to
CGRtools. While the 11-dimensional atom feature schema is unchanged, the underlying
values differ for aromaticity perception, ring detection, and hydrogen counting.

Breaking Change Migration Path
CGRtools imports Replace with chython equivalents
Pickle reaction rules Re-extract rules (outputs SMARTS TSV) or load legacy pickle (auto-converted)
ValueNetworkFunction as Tree arg Use EvaluationStrategy subclass
evaluation_type string config Use typed config objects (ValueNetworkEvaluationConfig, etc.)
tree.policy_network Use tree.expansion_function
tree.value_network Use tree.evaluator
tree.building_blocks mutation Filter before Tree init (frozenset)
value_network_path in run_search() Use evaluation_config parameter
--extra cuda Use --extra cu126 or --extra cu128
download_all_data() Use download_preset()
Pretrained models Retrain — feature vectors differ
HuggingFace repo Data moved to SynPlanner-data repo

Don't miss a new SynPlanner release

NewReleases is sending notifications on new releases.