github yusufkaraaslan/Skill_Seekers v3.0.0

7 hours ago

[3.0.0] - 2026-02-10

🚀 "Universal Intelligence Platform" - Major Release

Theme: Transform any documentation into structured knowledge for any AI system.

This is our biggest release ever! v3.0.0 establishes Skill Seekers as the universal documentation preprocessor for the entire AI ecosystem - from RAG pipelines to AI coding assistants to Claude skills.

Highlights

  • 🚀 16 platform adaptors (up from 4 in v2.x)
  • 🛠️ 26 MCP tools (up from 9)
  • 1,852 tests passing (up from 700+)
  • ☁️ Cloud storage support (S3, GCS, Azure)
  • 🔄 CI/CD ready (GitHub Action + Docker)
  • 📦 12 example projects for every integration
  • 📚 18 integration guides complete

Added - Platform Adaptors (16 Total)

RAG & Vector Databases (8)

  • LangChain (--format langchain) - Output LangChain Document objects
  • LlamaIndex (--format llama-index) - Output LlamaIndex TextNode objects
  • Chroma (--format chroma) - Direct ChromaDB integration
  • FAISS (--format faiss) - Facebook AI Similarity Search
  • Haystack (--format haystack) - Deepset Haystack pipelines
  • Qdrant (--format qdrant) - Qdrant vector database
  • Weaviate (--format weaviate) - Weaviate vector search
  • Pinecone-ready (--target markdown) - Markdown format ready for Pinecone

AI Platforms (3)

  • Claude (--target claude) - Claude AI skills (ZIP + YAML)
  • Gemini (--target gemini) - Google Gemini skills (tar.gz)
  • OpenAI (--target openai) - OpenAI ChatGPT (ZIP + Vector Store)

AI Coding Assistants (4)

  • Cursor (--target claude + .cursorrules) - Cursor IDE integration
  • Windsurf (--target claude + .windsurfrules) - Windsurf/Codeium
  • Cline (--target claude + .clinerules) - VS Code extension
  • Continue.dev (--target claude) - Universal IDE support

Generic (1)

  • Markdown (--target markdown) - Generic ZIP export

Added - MCP Tools (26 Total)

Config Tools (3)

  • generate_config - Generate scraping configuration
  • list_configs - List available preset configs
  • validate_config - Validate config JSON structure

Scraping Tools (8)

  • estimate_pages - Estimate page count before scraping
  • scrape_docs - Scrape documentation websites
  • scrape_github - Scrape GitHub repositories
  • scrape_pdf - Extract from PDF files
  • scrape_codebase - Analyze local codebases
  • detect_patterns - Detect design patterns in code
  • extract_test_examples - Extract usage examples from tests
  • build_how_to_guides - Build how-to guides from code

Packaging Tools (4)

  • package_skill - Package skill for target platform
  • upload_skill - Upload to LLM platform
  • enhance_skill - AI-powered enhancement
  • install_skill - One-command complete workflow

Source Tools (5)

  • fetch_config - Fetch config from remote source
  • submit_config - Submit config for approval
  • add_config_source - Add Git config source
  • list_config_sources - List config sources
  • remove_config_source - Remove config source

Splitting Tools (2)

  • split_config - Split large configs
  • generate_router - Generate router skills

Vector DB Tools (4)

  • export_to_weaviate - Export to Weaviate
  • export_to_chroma - Export to ChromaDB
  • export_to_faiss - Export to FAISS
  • export_to_qdrant - Export to Qdrant

Added - Cloud Storage

Upload skills directly to cloud storage:

  • AWS S3 - skill-seekers cloud upload --provider s3 --bucket my-bucket
  • Google Cloud Storage - skill-seekers cloud upload --provider gcs --bucket my-bucket
  • Azure Blob Storage - skill-seekers cloud upload --provider azure --container my-container

Features:

  • Upload/download directories
  • List files with metadata
  • Check file existence
  • Generate presigned URLs
  • Cloud-agnostic interface

Added - CI/CD Support

GitHub Action

- uses: skill-seekers/action@v1
  with:
    config: configs/react.json
    format: langchain

Features:

  • Auto-update on doc changes
  • Matrix builds for multiple frameworks
  • Scheduled updates
  • Caching for faster runs

Docker

docker run -v $(pwd):/data skill-seekers:latest scrape --config /data/config.json

Added - Production Infrastructure

  • Helm Charts - Kubernetes deployment
  • Docker Compose - Local vector DB stack
  • Monitoring - Sentry integration, sync monitoring
  • Benchmarking - Performance testing framework

Added - 12 Example Projects

Complete working examples for every integration:

  1. langchain-rag-pipeline - React docs → LangChain → Chroma
  2. llama-index-query-engine - Vue docs → LlamaIndex
  3. pinecone-upsert - Documentation → Pinecone
  4. chroma-example - Full ChromaDB workflow
  5. faiss-example - FAISS index building
  6. haystack-pipeline - Haystack RAG pipeline
  7. qdrant-example - Qdrant vector DB
  8. weaviate-example - Weaviate integration
  9. cursor-react-skill - React skill for Cursor
  10. windsurf-fastapi-context - FastAPI for Windsurf
  11. cline-django-assistant - Django assistant for Cline
  12. continue-dev-universal - Universal IDE context

Quality Metrics

  • 1,852 tests across 100 test files
  • 58,512 lines of Python code
  • 80+ documentation files
  • 100% test coverage for critical paths
  • CI/CD on every commit

Fixed

URL Conversion Bug with Anchor Fragments (Issue #277)

  • Critical Bug Fix: Fixed 404 errors when scraping documentation with anchor links
    • Problem: URLs with anchor fragments (e.g., #synchronous-initialization) were malformed
      • Incorrect: https://example.com/docs/api#method/index.html.md
      • Correct: https://example.com/docs/api/index.html.md
    • Root Cause: _convert_to_md_urls() didn't strip anchor fragments before appending /index.html.md
    • Solution: Parse URLs with urllib.parse to remove fragments and deduplicate base URLs
    • Impact: Prevents duplicate requests for the same page with different anchors
    • Additional Fix: Changed .md detection from ".md" in url to url.endswith('.md')
      • Prevents false matches on URLs like /cmd-line or /AMD-processors
  • Test Coverage: 12 comprehensive tests covering all edge cases
    • Anchor fragment stripping
    • Deduplication of multiple anchors on same URL
    • Query parameter preservation
    • Trailing slash handling
    • Real-world MikroORM case validation
    • 54/54 tests passing (42 existing + 12 new)
  • Reported by: @devjones via Issue #277

Added

Extended Language Detection (NEW)

  • 7 New Programming Languages: Dart, Scala, SCSS, SASS, Elixir, Lua, Perl
    • Pattern-based detection with confidence scoring (0.6-0.8+ thresholds)
    • 70 regex patterns prioritizing unique identifiers (weight 5)
    • Framework-specific patterns:
      • Dart: Flutter widgets (StatelessWidget, StatefulWidget, Widget build())
      • Scala: Pattern matching (case class, trait, match {})
      • SCSS: Preprocessor features ($variables, @mixin, @include, @extend)
      • SASS: Indented syntax (=mixin, +include, $variables)
      • Elixir: Functional patterns (defmodule, def ... do, pipe operator |>)
      • Lua: Game scripting (local, repeat...until, ~=, elseif)
      • Perl: Text processing (my $, use strict, sub, chomp, regex =~)
    • Comprehensive test coverage: 7 new tests, 30/30 passing (100%)
    • False positive prevention: Unique identifiers (weight 5) + confidence thresholds
    • No regressions: All existing language detection tests still pass
    • Total language support: Now 27+ programming languages
    • Credit: Contributed by @PaawanBarach via PR #275

Multi-Agent Support for Local Enhancement (NEW)

  • Multiple Coding Agent Support: Choose your preferred local coding agent for SKILL.md enhancement
    • Claude Code (default): Claude Code CLI with --dangerously-skip-permissions
    • Codex CLI: OpenAI Codex CLI with --full-auto and --skip-git-repo-check
    • Copilot CLI: GitHub Copilot CLI (gh copilot chat)
    • OpenCode CLI: OpenCode CLI
    • Custom agents: Use any CLI tool with --agent custom --agent-cmd "command {prompt_file}"
  • CLI Arguments: New flags for agent selection
    • --agent: Choose agent (claude, codex, copilot, opencode, custom)
    • --agent-cmd: Override command template for custom agents
  • Environment Variables: CI/CD friendly configuration
    • SKILL_SEEKER_AGENT: Default agent to use
    • SKILL_SEEKER_AGENT_CMD: Default command template for custom agents
  • Security First: Custom command validation
    • Blocks dangerous shell characters (;, &, |, $, `, \n, \r)
    • Validates executable exists in PATH
    • Safe parsing with shlex.split()
  • Dual Input Modes: Supports both file-based and stdin-based agents
    • File-based: Uses {prompt_file} placeholder (Claude, custom agents)
    • Stdin-based: Pipes prompt via stdin (Codex CLI)
  • Backward Compatible: Claude Code remains the default, no breaking changes
  • Comprehensive Tests: 13 new tests covering all agent types and security validation
  • Agent Normalization: Smart alias handling (e.g., "claude-code" → "claude")
  • Credit: Contributed by @rovo79 (Robert Dean) via PR #270

C3.10: Signal Flow Analysis for Godot Projects (NEW)

  • Complete Signal Flow Analysis System: Analyze event-driven architectures in Godot game projects

    • Signal declaration extraction (signal keyword detection)
    • Connection mapping (.connect() calls with targets and methods)
    • Emission tracking (.emit() and emit_signal() calls)
    • 208 signals, 634 connections, and 298 emissions detected in test project (Cosmic Idler)
    • Signal density metrics (signals per file)
    • Event chain detection (signals triggering other signals)
    • Output: signal_flow.json, signal_flow.mmd (Mermaid diagram), signal_reference.md
  • Signal Pattern Detection: Three major patterns identified

    • EventBus Pattern (0.90 confidence): Centralized signal hub in autoload
    • Observer Pattern (0.85 confidence): Multi-observer signals (3+ listeners)
    • Event Chains (0.80 confidence): Cascading signal propagation
  • Signal-Based How-To Guides (C3.10.1): AI-generated usage guides

    • Step-by-step guides (Connect → Emit → Handle)
    • Real code examples from project
    • Common usage locations
    • Parameter documentation
    • Output: signal_how_to_guides.md (10 guides for Cosmic Idler)

Godot Game Engine Support

  • Comprehensive Godot File Type Support: Full analysis of Godot 4.x projects

    • GDScript (.gd): 265 files analyzed in test project
    • Scene files (.tscn): 118 scene files
    • Resource files (.tres): 38 resource files
    • Shader files (.gdshader, .gdshaderinc): 9 shader files
    • C# integration: Phantom Camera addon (13 files)
  • GDScript Language Support: Complete GDScript parsing with regex-based extraction

    • Dependency extraction: preload(), load(), extends patterns
    • Test framework detection: GUT, gdUnit4, WAT
    • Test file patterns: test_*.gd, *_test.gd
    • Signal syntax: signal, .connect(), .emit()
    • Export decorators: @export, @onready
    • Test decorators: @test (gdUnit4)
  • Game Engine Framework Detection: Improved detection for Unity, Unreal, Godot

    • Godot markers: project.godot, .godot directory, .tscn, .tres, .gd files
    • Unity markers: Assembly-CSharp.csproj, UnityEngine.dll, ProjectSettings/ProjectVersion.txt
    • Unreal markers: .uproject, Source/, Config/DefaultEngine.ini
    • Fixed false positive Unity detection (was using generic "Assets" keyword)
  • GDScript Test Extraction: Extract usage examples from Godot test files

    • 396 test cases extracted from 20 GUT test files in test project
    • Patterns: instantiation (preload().new(), load().new()), assertions (assert_eq, assert_true), signals
    • GUT framework: extends GutTest, func test_*(), add_child_autofree()
    • Test categories: instantiation, assertions, signal connections, setup/teardown
    • Real code examples from production test files

C3.9: Project Documentation Extraction

  • Markdown Documentation Extraction: Automatically extracts and categorizes all .md files from projects
    • Smart categorization by folder/filename (overview, architecture, guides, workflows, features, etc.)
    • Processing depth control: surface (raw copy), deep (parse+summarize), full (AI-enhanced)
    • AI enhancement (level 2+) adds topic extraction and cross-references
    • New "📖 Project Documentation" section in SKILL.md
    • Output to references/documentation/ organized by category
    • Default ON, use --skip-docs to disable
    • 15 new tests for documentation extraction features

Granular AI Enhancement Control

  • --enhance-level Flag: Fine-grained control over AI enhancement (0-3)
    • Level 0: No AI enhancement (default)
    • Level 1: SKILL.md enhancement only (fast, high value)
    • Level 2: SKILL.md + Architecture + Config + Documentation
    • Level 3: Full enhancement (patterns, tests, config, architecture, docs)
  • Config Integration: default_enhance_level setting in ~/.config/skill-seekers/config.json
  • MCP Support: All MCP tools updated with enhance_level parameter
  • Independent from --comprehensive: Enhancement level is separate from feature depth

C# Language Support

  • C# Test Example Extraction: Full support for C# test frameworks
    • Language alias mapping (C# → csharp, C++ → cpp)
    • NUnit, xUnit, MSTest test framework patterns
    • Mock pattern support (NSubstitute, Moq)
    • Zenject dependency injection patterns
    • Setup/teardown method extraction
    • 2 new tests for C# extraction features

Performance Optimizations

  • Parallel LOCAL Mode AI Enhancement: 6-12x faster with ThreadPoolExecutor
    • Concurrent workers: 3 (configurable via local_parallel_workers)
    • Batch processing: 20 patterns per Claude CLI call (configurable via local_batch_size)
    • Significant speedup for large codebases
  • Config Settings: New ai_enhancement section in config
    • local_batch_size: Patterns per CLI call (default: 20)
    • local_parallel_workers: Concurrent workers (default: 3)

UX Improvements

  • Auto-Enhancement: SKILL.md automatically enhanced when using --enhance or --comprehensive

    • No need for separate skill-seekers enhance command
    • Seamless one-command workflow
    • 10-minute timeout for large codebases
    • Graceful fallback with retry instructions on failure
  • LOCAL Mode Fallback: All AI enhancements now fall back to LOCAL mode when no API key is set

    • Applies to: pattern enhancement (C3.1), test examples (C3.2), architecture (C3.7)
    • Uses Claude Code CLI instead of failing silently
    • Better UX: "Using LOCAL mode (Claude Code CLI)" instead of "AI disabled"
  • Support for custom Claude-compatible API endpoints via ANTHROPIC_BASE_URL environment variable

  • Compatibility with GLM-4.7 and other Claude-compatible APIs across all AI enhancement features

Changed

  • All AI enhancement modules now respect ANTHROPIC_BASE_URL for custom endpoints
  • Updated documentation with GLM-4.7 configuration examples
  • Rewritten LOCAL mode in config_enhancer.py to use Claude CLI properly with explicit output file paths
  • Updated MCP scrape_codebase_tool with skip_docs and enhance_level parameters
  • Updated CLAUDE.md with C3.9 documentation extraction feature
  • Increased default batch size from 5 to 20 patterns for LOCAL mode

Fixed

  • C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
  • Config Type Field Mismatch: Fixed KeyError in config_enhancer.py by supporting both "type" and "config_type" fields
  • LocalSkillEnhancer Import: Fixed incorrect import and method call in main.py (SkillEnhancer → LocalSkillEnhancer)
  • Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)

Godot Game Engine Fixes

  • GDScript Dependency Extraction: Fixed 265+ "Syntax error in *.gd" warnings (commit 3e6c448)

    • GDScript files were incorrectly routed to Python AST parser
    • Created dedicated _extract_gdscript_imports() with regex patterns
    • Now correctly parses preload(), load(), extends patterns
    • Result: 377 dependencies extracted with 0 warnings
  • Framework Detection False Positive: Fixed Unity detection on Godot projects (commit 50b28fe)

    • Was detecting "Unity" due to generic "Assets" keyword in comments
    • Changed Unity markers to specific files: Assembly-CSharp.csproj, UnityEngine.dll, Library/
    • Now correctly detects Godot via project.godot, .godot directory
  • Circular Dependencies: Fixed self-referential cycles (commit 50b28fe)

    • 3 self-loop warnings (files depending on themselves)
    • Added target != file_path check in dependency graph builder
    • Result: 0 circular dependencies detected
  • GDScript Test Discovery: Fixed 0 test files found in Godot projects (commit 50b28fe)

    • Added GDScript test patterns: test_*.gd, *_test.gd
    • Added GDScript to LANGUAGE_MAP
    • Result: 32 test files discovered (20 GUT files with 396 tests)
  • GDScript Test Extraction: Fixed "Language GDScript not supported" warning (commit c826690)

    • Added GDScript regex patterns to PATTERNS dictionary
    • Patterns: instantiation (preload().new()), assertions (assert_eq), signals (.connect())
    • Result: 22 test examples extracted successfully
  • Config Extractor Array Handling: Fixed JSON/YAML array parsing (commit fca0951)

    • Error: 'list' object has no attribute 'items' on root-level arrays
    • Added isinstance checks for dict/list/primitive at root
    • Result: No JSON array errors, save.json parsed correctly
  • Progress Indicators: Fixed missing progress for small batches (commit eec37f5)

    • Progress only shown every 5 batches, invisible for small jobs
    • Modified condition to always show for batches < 10
    • Result: "Progress: 1/2 batches completed" now visible

Other Fixes

  • C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
  • Config Type Field Mismatch: Fixed KeyError in config_enhancer.py by supporting both "type" and "config_type" fields
  • LocalSkillEnhancer Import: Fixed incorrect import and method call in main.py (SkillEnhancer → LocalSkillEnhancer)
  • Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)

Tests

  • GDScript Test Extraction Test: Added comprehensive test case for GDScript GUT/gdUnit4 framework
    • Tests player instantiation with preload() and load()
    • Tests signal connections and emissions
    • Tests gdUnit4 @test annotation syntax
    • Tests game state management patterns
    • 4 test functions with 60+ lines of GDScript code
    • Validates extraction of instantiations, assertions, and signal patterns

Removed

  • Removed client-specific documentation files from repository

Don't miss a new Skill_Seekers release

NewReleases is sending notifications on new releases.