yusufkaraaslan/Skill_Seekers v3.8.0 on GitHub

[3.8.0] - 2026-06-15

Theme: The Grand Unification refactor (one build pipeline, one AI transport, one parser definition, in-process MCP tools — see docs/UNIFICATION_PLAN.md) with a 13-bug audit fix-up and a real-world end-to-end CLI testing pass — plus MiniMax-M3, model selection, registry-driven platform targets, a Windows subprocess fix, and codebase de-duplication.

Fixed

Codebase skills built from a config no longer drop the API reference + dependency graph — the local source carried api_reference and dependency_graph, but the unified builder never wrote them out, so a skill built from a scan-emitted *-codebase.json shipped without its API reference (it was stranded in the scrape cache). Both are now promoted into references/codebase_analysis/<source>/ and linked from the index.
create ./path defaults to deep analysis — local codebase create defaulted to surface depth, producing an empty code_analysis.json and a misleading "analyzed 0 files" log; it now defaults to deep (matching the scraper default and the scan-config path), with an explicit --depth still taking precedence.
package works non-interactively — the quality-gate prompt raised EOFError when stdin wasn't a TTY (CI/pipes); it now auto-proceeds on a non-TTY or with the new --yes/-y flag, while an interactive terminal still prompts.
quality prints the score — the standalone command saved quality_report.json but printed nothing; it now shows the score/grade summary, and the report serializes metric levels as "info"/"warning" instead of the Python enum repr "MetricLevel.INFO".
doctor no longer miscounts GITHUB_TOKEN as an AI provider key — the API-keys check now names which keys are set, so a bare GitHub token isn't misread as a provider key being configured.
estimate <url> gives an actionable error — passing a URL (which create accepts) printed a bare "Config file not found"; it now explains that estimate takes a config file and points at create --dry-run, and exits non-zero.
Web sitemap discovery fails fast on unreachable hosts — the pre-crawl sitemap probes used a single scalar timeout; they now use a (connect, read) timeout so an unreachable host doesn't block the full window before the crawl starts.
Fork-bomb guard covers the primary LOCAL enhance path — _run_agent_command now marks SKILL_SEEKER_ENHANCE_ACTIVE in the spawned agent's environment (including terminal-mode scripts) and run() refuses nested spawns, so a local agent enhancing a skill can no longer recursively launch enhancement.
--dry-run and --output honored for unified configs — create skipped injecting both into UnifiedScraper (every other source type got them); dry-run now previews and returns without creating directories, for legacy and unified configs alike.
Trailing-slash --output no longer leaks intermediates into the packaged skill — SkillConverter resolves skill_dir once and strips trailing separators, so --output out/x/ can't place _extracted.json inside the skill directory.
snake_case config classification — config_extractor._path_has_word used \b, which never matches inside snake_case names (app_db.yaml); explicit lookarounds fix detection while keeping dbeaver/blog false positives excluded.
How-to guides no longer come out empty for control-flow-wrapped tests — step extraction recursively descends with/for/if/try blocks in source order, and empty AST results fall back to the heuristic instead of emitting empty guides.
Slack 429 retry exhaustion logs a truncation warning (matching the Discord path) instead of silently breaking pagination.
Kimi CLI output parsing no longer swallows unknown records — record boundary is any CamelCase constructor, so new record types (ToolCallPart, …) can't leak internals into extracted text.
scan re-runs no longer churn .archived/ — canonical-named files fetched into the out-dir are removed after copying to the slug target, eliminating phantom "removed" diffs on every re-scan.
Dry-run page estimates dedupe utm_* variants — _enqueue_url normalizes tracking params the same way the real crawl does.
MCP per-page diagnostics from worker threads are no longer dropped — contextvars are propagated into ThreadPoolExecutor workers (doc scraper, PDF extractor, enhancers, video visual), so the per-call log capture sees them.
Gemini/OpenAI adaptor enhancement gained the truncation gate and atomic save — both adaptors previously accepted truncated AI output and used a destructive rename-then-write save (a failed write left no SKILL.md); all adaptors now share one enhance flow with central truncation detection and backup+atomic-replace (enhance_skill's save path fixed the same way).
MCP extract_config_patterns tool works — it passed flags config_extractor's parser rejects, so it failed on every invocation; now mapped to the real flags and pinned by a regression test.
Unified-CLI flags that were silently rejected now work — estimate --unlimited/--timeout, update --generate-package/--apply-update, quality --output, stream --streaming-overlap-chars/--batch-size/--checkpoint, multilang --report/--export, install --target, and extract-test-examples --recursive were accepted by the standalone modules but rejected with "unrecognized arguments" by skill-seekers <cmd> (central-parser drift). A programmatic drift-guard test now fails CI if any module flag is missing from its central parser.
Windows: large subprocess output no longer freezes MCP tools (#397) — run_subprocess_with_streaming replaced its select()-based polling loop (unsupported on Windows pipes) with reader threads, and now bounds the timeout reliably. Previously scrape_docs/scrape_github and other tools could deadlock on a full (>64 KB) pipe buffer on Windows. The fix is applied to the single shared implementation, so all callers benefit.

Changed

quality --threshold defaults to None — without it, quality is report-only and keeps the historical exit-0 contract; the quality gate (non-zero exit below the score) fires only when --threshold is explicitly given.
All enhancement API calls go through AgentClient — one transport with a consistent truncation gate, timeout policy, and error classification. ANTHROPIC_BASE_URL, per-provider model overrides (ANTHROPIC_MODEL/GOOGLE_MODEL/OPENAI_MODEL/MOONSHOT_MODEL), and the global SKILL_SEEKER_MODEL / SKILL_SEEKER_PROVIDER overrides are now honored everywhere. API-key auto-detection follows the API_PROVIDERS registry order (Anthropic → Google → OpenAI → Moonshot). video_visual frame classification is the documented multimodal exception.
MCP tools run in-process — estimate_pages, detect_patterns, extract_test_examples, extract_config_patterns, build_how_to_guides, split_config, generate_router, package_skill, and upload_skill call the real CLI main() via a shared run_cli_main() helper instead of spawning subprocesses (faster startup, identical output contract; former hard subprocess timeouts become advisory). enhance_skill (LOCAL agent) and install_skill's enhancement step stay subprocess by design (fork-bomb-guard semantics).
Platform --target choices are derived from the adaptor registry (#400) — enhance, upload, package, and install now compute their choices from get_enhancement_platforms() / get_upload_platforms() / list_platforms() instead of hand-maintained lists, so newly registered adaptors appear automatically and the lists can no longer drift. Non-breaking (each new list is a superset of the old).

Added

stream --output — collected chunks are written as JSON (the flag existed in the central parser but chunks were processed and dropped).
multilang --languages — restricts --detect/--export to the given languages (previously a central-parser fiction).
skill_seekers.services package — marketplace_manager, marketplace_publisher, config_publisher, source_manager, and git_repo moved out of mcp/ so the CLI can import this domain logic without the optional [mcp] extra. Back-compat shims remain at the old skill_seekers.mcp.* paths.
get_converter("config", {...}) — UnifiedScraper now accepts the factory-shaped config dict, so unified configs construct through the same factory as every other source type (legacy positional construction still supported).
cli/exit_codes.py — standard exit-code constants (EXIT_SUCCESS/EXIT_ERROR/EXIT_VALIDATION/EXIT_INTERRUPT).
--model flag for enhance and package (#395, #398) — override the platform's default model, e.g. skill-seekers enhance output/react/ --target minimax --model MiniMax-M2.7 or skill-seekers package output/react/ --target minimax --model MiniMax-M2.7. Honored uniformly across all enhancement adaptors and recorded in package metadata. Resurrects the previously-dead custom_model config key.
MiniMax-M3 is the new default MiniMax model (#395) — fresh --target minimax runs use M3; the previous-generation M2.7 remains selectable via --model. Docs (MINIMAX_INTEGRATION.md, MULTI_LLM_SUPPORT.md) refreshed.
More enhancement targets (#395) — enhance --target now accepts every enhancement-capable adaptor (adds minimax, deepseek, qwen, openrouter, together, fireworks); previously only claude/gemini/openai/kimi were reachable.
More upload targets + supports_upload() capability (#400) — upload --target now accepts every adaptor with a real upload, adding minimax, deepseek, qwen, openrouter, together, fireworks, and pinecone. New supports_upload() adaptor method and get_upload_platforms() helper.

Internal

DocumentSkillBuilder — the build side of all 9 document scrapers (epub, word, pptx, html, pdf, jupyter, man, rss, chat) now lives once in cli/document_skill_builder.py (net −1,859 lines across the scrapers). Every port is byte-identical, proven by golden trees in tests/golden/phase2/ (UPDATE_GOLDENS=1 refreshes them — only on purpose).
UnifiedScraper dispatch table + shared engine — scrape_all_sources() routes through a class-level SOURCE_DISPATCH table and _scrape_with_converter() handles the 13 mechanical source types through the public get_converter()/extract() interface (−280 lines); new converter types registered in CONVERTER_REGISTRY work in unified configs automatically.
Single-definition CLI parsers — the central SubcommandParser classes in cli/parsers/ are now the only definition of each command's flags; standalone main(args=None) paths build their parser from the central class, and a drift-guard test asserts identical dests/defaults/option strings.
ExecutionContext.override() is contextvars-based — concurrent threads/asyncio tasks (the MCP server) can no longer clobber each other's overrides; nested overrides stack and unwind; exceptions restore.
One home for agent/provider registries and batching — AGENT_PRESETS and API_PROVIDERS live only in agent_client.py (a silently-diverged duplicate kimi preset is gone); cli/parallel_batches.py:run_batches_parallel() replaces three duplicated ThreadPoolExecutor blocks.
Import hygiene — all seven sys.path.insert hacks in mcp/ removed in favor of absolute skill_seekers.* imports; no more dual module identities.
Performance — incremental Kotlin brace-depth tracking (was O(n²) prefix scans) and index-based class-body scans in code_analyzer, per-build memoization of import resolution in dependency_analyzer, GitHub per_page restored to min(max_count, 100).
De-duplicated copy-pasted code into shared modules — mcp/tools/subprocess_utils.py (the streaming subprocess helper, #397), mcp/tools/_common.py (TextContent fallback + CLI_DIR, #401), and cli/scraper_utils.py (score_code_quality + extract_table_from_html, #402). Behavior preserved (parity-tested); ~hundreds of duplicated lines removed.