[3.8.0] - 2026-06-15
Theme: The Grand Unification refactor (one build pipeline, one AI transport, one parser definition, in-process MCP tools — see docs/UNIFICATION_PLAN.md) with a 13-bug audit fix-up and a real-world end-to-end CLI testing pass — plus MiniMax-M3, model selection, registry-driven platform targets, a Windows subprocess fix, and codebase de-duplication.
Fixed
- Codebase skills built from a config no longer drop the API reference + dependency graph — the local source carried
api_referenceanddependency_graph, but the unified builder never wrote them out, so a skill built from a scan-emitted*-codebase.jsonshipped without its API reference (it was stranded in the scrape cache). Both are now promoted intoreferences/codebase_analysis/<source>/and linked from the index. create ./pathdefaults to deep analysis — local codebase create defaulted tosurfacedepth, producing an emptycode_analysis.jsonand a misleading "analyzed 0 files" log; it now defaults todeep(matching the scraper default and the scan-config path), with an explicit--depthstill taking precedence.packageworks non-interactively — the quality-gate prompt raisedEOFErrorwhen stdin wasn't a TTY (CI/pipes); it now auto-proceeds on a non-TTY or with the new--yes/-yflag, while an interactive terminal still prompts.qualityprints the score — the standalone command savedquality_report.jsonbut printed nothing; it now shows the score/grade summary, and the report serializes metric levels as"info"/"warning"instead of the Python enum repr"MetricLevel.INFO".doctorno longer miscountsGITHUB_TOKENas an AI provider key — the API-keys check now names which keys are set, so a bare GitHub token isn't misread as a provider key being configured.estimate <url>gives an actionable error — passing a URL (whichcreateaccepts) printed a bare "Config file not found"; it now explains thatestimatetakes a config file and points atcreate --dry-run, and exits non-zero.- Web sitemap discovery fails fast on unreachable hosts — the pre-crawl sitemap probes used a single scalar timeout; they now use a
(connect, read)timeout so an unreachable host doesn't block the full window before the crawl starts. - Fork-bomb guard covers the primary LOCAL enhance path —
_run_agent_commandnow marksSKILL_SEEKER_ENHANCE_ACTIVEin the spawned agent's environment (including terminal-mode scripts) andrun()refuses nested spawns, so a local agent enhancing a skill can no longer recursively launch enhancement. --dry-runand--outputhonored for unified configs —createskipped injecting both intoUnifiedScraper(every other source type got them); dry-run now previews and returns without creating directories, for legacy and unified configs alike.- Trailing-slash
--outputno longer leaks intermediates into the packaged skill —SkillConverterresolvesskill_dironce and strips trailing separators, so--output out/x/can't place_extracted.jsoninside the skill directory. - snake_case config classification —
config_extractor._path_has_wordused\b, which never matches inside snake_case names (app_db.yaml); explicit lookarounds fix detection while keeping dbeaver/blog false positives excluded. - How-to guides no longer come out empty for control-flow-wrapped tests — step extraction recursively descends
with/for/if/tryblocks in source order, and empty AST results fall back to the heuristic instead of emitting empty guides. - Slack 429 retry exhaustion logs a truncation warning (matching the Discord path) instead of silently breaking pagination.
- Kimi CLI output parsing no longer swallows unknown records — record boundary is any CamelCase constructor, so new record types (
ToolCallPart, …) can't leak internals into extracted text. scanre-runs no longer churn.archived/— canonical-named files fetched into the out-dir are removed after copying to the slug target, eliminating phantom "removed" diffs on every re-scan.- Dry-run page estimates dedupe
utm_*variants —_enqueue_urlnormalizes tracking params the same way the real crawl does. - MCP per-page diagnostics from worker threads are no longer dropped — contextvars are propagated into
ThreadPoolExecutorworkers (doc scraper, PDF extractor, enhancers, video visual), so the per-call log capture sees them. - Gemini/OpenAI adaptor enhancement gained the truncation gate and atomic save — both adaptors previously accepted truncated AI output and used a destructive rename-then-write save (a failed write left no SKILL.md); all adaptors now share one enhance flow with central truncation detection and backup+atomic-replace (
enhance_skill's save path fixed the same way). - MCP
extract_config_patternstool works — it passed flagsconfig_extractor's parser rejects, so it failed on every invocation; now mapped to the real flags and pinned by a regression test. - Unified-CLI flags that were silently rejected now work —
estimate --unlimited/--timeout,update --generate-package/--apply-update,quality --output,stream --streaming-overlap-chars/--batch-size/--checkpoint,multilang --report/--export,install --target, andextract-test-examples --recursivewere accepted by the standalone modules but rejected with "unrecognized arguments" byskill-seekers <cmd>(central-parser drift). A programmatic drift-guard test now fails CI if any module flag is missing from its central parser. - Windows: large subprocess output no longer freezes MCP tools (#397) —
run_subprocess_with_streamingreplaced itsselect()-based polling loop (unsupported on Windows pipes) with reader threads, and now bounds the timeout reliably. Previouslyscrape_docs/scrape_githuband other tools could deadlock on a full (>64 KB) pipe buffer on Windows. The fix is applied to the single shared implementation, so all callers benefit.
Changed
quality --thresholddefaults toNone— without it,qualityis report-only and keeps the historical exit-0 contract; the quality gate (non-zero exit below the score) fires only when--thresholdis explicitly given.- All enhancement API calls go through
AgentClient— one transport with a consistent truncation gate, timeout policy, and error classification.ANTHROPIC_BASE_URL, per-provider model overrides (ANTHROPIC_MODEL/GOOGLE_MODEL/OPENAI_MODEL/MOONSHOT_MODEL), and the globalSKILL_SEEKER_MODEL/SKILL_SEEKER_PROVIDERoverrides are now honored everywhere. API-key auto-detection follows theAPI_PROVIDERSregistry order (Anthropic → Google → OpenAI → Moonshot).video_visualframe classification is the documented multimodal exception. - MCP tools run in-process —
estimate_pages,detect_patterns,extract_test_examples,extract_config_patterns,build_how_to_guides,split_config,generate_router,package_skill, andupload_skillcall the real CLImain()via a sharedrun_cli_main()helper instead of spawning subprocesses (faster startup, identical output contract; former hard subprocess timeouts become advisory).enhance_skill(LOCAL agent) andinstall_skill's enhancement step stay subprocess by design (fork-bomb-guard semantics). - Platform
--targetchoices are derived from the adaptor registry (#400) —enhance,upload,package, andinstallnow compute their choices fromget_enhancement_platforms()/get_upload_platforms()/list_platforms()instead of hand-maintained lists, so newly registered adaptors appear automatically and the lists can no longer drift. Non-breaking (each new list is a superset of the old).
Added
stream --output— collected chunks are written as JSON (the flag existed in the central parser but chunks were processed and dropped).multilang --languages— restricts--detect/--exportto the given languages (previously a central-parser fiction).skill_seekers.servicespackage —marketplace_manager,marketplace_publisher,config_publisher,source_manager, andgit_repomoved out ofmcp/so the CLI can import this domain logic without the optional[mcp]extra. Back-compat shims remain at the oldskill_seekers.mcp.*paths.get_converter("config", {...})—UnifiedScrapernow accepts the factory-shaped config dict, so unified configs construct through the same factory as every other source type (legacy positional construction still supported).cli/exit_codes.py— standard exit-code constants (EXIT_SUCCESS/EXIT_ERROR/EXIT_VALIDATION/EXIT_INTERRUPT).--modelflag forenhanceandpackage(#395, #398) — override the platform's default model, e.g.skill-seekers enhance output/react/ --target minimax --model MiniMax-M2.7orskill-seekers package output/react/ --target minimax --model MiniMax-M2.7. Honored uniformly across all enhancement adaptors and recorded in package metadata. Resurrects the previously-deadcustom_modelconfig key.- MiniMax-M3 is the new default MiniMax model (#395) — fresh
--target minimaxruns use M3; the previous-generation M2.7 remains selectable via--model. Docs (MINIMAX_INTEGRATION.md,MULTI_LLM_SUPPORT.md) refreshed. - More enhancement targets (#395) —
enhance --targetnow accepts every enhancement-capable adaptor (addsminimax,deepseek,qwen,openrouter,together,fireworks); previously onlyclaude/gemini/openai/kimiwere reachable. - More upload targets +
supports_upload()capability (#400) —upload --targetnow accepts every adaptor with a real upload, addingminimax,deepseek,qwen,openrouter,together,fireworks, andpinecone. Newsupports_upload()adaptor method andget_upload_platforms()helper.
Internal
DocumentSkillBuilder— the build side of all 9 document scrapers (epub, word, pptx, html, pdf, jupyter, man, rss, chat) now lives once incli/document_skill_builder.py(net −1,859 lines across the scrapers). Every port is byte-identical, proven by golden trees intests/golden/phase2/(UPDATE_GOLDENS=1refreshes them — only on purpose).UnifiedScraperdispatch table + shared engine —scrape_all_sources()routes through a class-levelSOURCE_DISPATCHtable and_scrape_with_converter()handles the 13 mechanical source types through the publicget_converter()/extract()interface (−280 lines); new converter types registered inCONVERTER_REGISTRYwork in unified configs automatically.- Single-definition CLI parsers — the central
SubcommandParserclasses incli/parsers/are now the only definition of each command's flags; standalonemain(args=None)paths build their parser from the central class, and a drift-guard test asserts identical dests/defaults/option strings. ExecutionContext.override()is contextvars-based — concurrent threads/asyncio tasks (the MCP server) can no longer clobber each other's overrides; nested overrides stack and unwind; exceptions restore.- One home for agent/provider registries and batching —
AGENT_PRESETSandAPI_PROVIDERSlive only inagent_client.py(a silently-diverged duplicate kimi preset is gone);cli/parallel_batches.py:run_batches_parallel()replaces three duplicatedThreadPoolExecutorblocks. - Import hygiene — all seven
sys.path.inserthacks inmcp/removed in favor of absoluteskill_seekers.*imports; no more dual module identities. - Performance — incremental Kotlin brace-depth tracking (was O(n²) prefix scans) and index-based class-body scans in
code_analyzer, per-build memoization of import resolution independency_analyzer, GitHubper_pagerestored tomin(max_count, 100). - De-duplicated copy-pasted code into shared modules —
mcp/tools/subprocess_utils.py(the streaming subprocess helper, #397),mcp/tools/_common.py(TextContentfallback +CLI_DIR, #401), andcli/scraper_utils.py(score_code_quality+extract_table_from_html, #402). Behavior preserved (parity-tested); ~hundreds of duplicated lines removed.