[3.5.1] - 2026-04-12
Added
- Centralized
defaults.jsonconfig — single source of truth for all default values (rate_limit,max_pages,workers,async_mode, enhancement, analysis, RAG settings). Newdefaults.pyloader module. All 15+ files that previously hardcoded defaults now read from this file (#356) - Low-signal code snippet filtering —
_is_low_signal_code_snippet()filters junk patterns like bareTrue,options, single identifiers from quick references (#360) - Pattern description normalization —
_normalize_pattern_description()cleans boilerplate prefixes and truncates to first meaningful sentence (#360) - Example language priority ranking —
_example_language_priority()ranks Python > Bash > JSON > etc. for SKILL.md examples (#360) checkpoint_exists()method onDocToSkillConverter— was called but never defined (#360)- Unified config source normalization —
DocToSkillConverter.__init__merges fields fromsources[0]into flat config for compatibility (#360) display_namesupport in SKILL.md generation — produces cleaner titles and slugs (#360)- New tests:
test_doc_scraper_entrypoint.py(regression for_run_scraping), quick-reference quality tests, docs-only compatibility tests, nested reference coverage tests (#360)
Changed
max_pagesdefault is now unlimited (-1) — the scraper fetches all pages unless the user explicitly sets--max-pages. Previously defaulted to 500 (#356)--no-rate-limitflag now works — was defined in CLI arguments but never consumed byExecutionContext(#356)constants.pyreads fromdefaults.json— no longer contains hardcoded magic numbers (#356)ExecutionContext.ScrapingSettings—rate_limitandmax_pagesnow use real defaults instead ofNone, preventing None-poisoning downstream (#356)- SKILL.md frontmatter cleanup — empty
doc_version:andversion:fields are now omitted; placeholder sections removed (#360) - Enhancement routing through platform adaptors instead of importing nonexistent
enhance_skill_mdhelper (#360) quality_metrics.pyusesrglobfor nested reference directories in unified skills (#360)
Fixed
TypeError: '>' not supported between instances of 'NoneType' and 'int'—rate_limitdefaulted toNoneinExecutionContext, which flowed throughconfig.get("rate_limit", DEFAULT)(dict.get returns None when the key exists with value None, ignoring the fallback). Fixed indoc_scraper.py(sync + async paths),estimate_pages.py, andsync_config.py(#356, #359)discover_urls()loop never executed with unlimitedmax_pages—len(discovered) < -1is always False. Added unlimited mode guard (#356)converter.scrape()called nonexistent method in_run_scraping()— changed toconverter.scrape_all()(#360)- None-safety for BeautifulSoup attributes —
link["href"],sitemap.text,meta_desc["content"]guarded against None XML text nodes (#360) - Python 3.10 compatibility — backslash in f-string in
quality_metrics.pynot supported before 3.12 (#360)