yusufkaraaslan/Skill_Seekers v3.0.0 on GitHub

[3.0.0] - 2026-02-10

🚀 "Universal Intelligence Platform" - Major Release

Theme: Transform any documentation into structured knowledge for any AI system.

This is our biggest release ever! v3.0.0 establishes Skill Seekers as the universal documentation preprocessor for the entire AI ecosystem - from RAG pipelines to AI coding assistants to Claude skills.

Highlights

🚀 16 platform adaptors (up from 4 in v2.x)
🛠️ 26 MCP tools (up from 9)
✅ 1,852 tests passing (up from 700+)
☁️ Cloud storage support (S3, GCS, Azure)
🔄 CI/CD ready (GitHub Action + Docker)
📦 12 example projects for every integration
📚 18 integration guides complete

Added - Platform Adaptors (16 Total)

RAG & Vector Databases (8)

LangChain (--format langchain) - Output LangChain Document objects
LlamaIndex (--format llama-index) - Output LlamaIndex TextNode objects
Chroma (--format chroma) - Direct ChromaDB integration
FAISS (--format faiss) - Facebook AI Similarity Search
Haystack (--format haystack) - Deepset Haystack pipelines
Qdrant (--format qdrant) - Qdrant vector database
Weaviate (--format weaviate) - Weaviate vector search
Pinecone-ready (--target markdown) - Markdown format ready for Pinecone

AI Platforms (3)

Claude (--target claude) - Claude AI skills (ZIP + YAML)
Gemini (--target gemini) - Google Gemini skills (tar.gz)
OpenAI (--target openai) - OpenAI ChatGPT (ZIP + Vector Store)

AI Coding Assistants (4)

Cursor (--target claude + .cursorrules) - Cursor IDE integration
Windsurf (--target claude + .windsurfrules) - Windsurf/Codeium
Cline (--target claude + .clinerules) - VS Code extension
Continue.dev (--target claude) - Universal IDE support

Generic (1)

Markdown (--target markdown) - Generic ZIP export

Added - MCP Tools (26 Total)

Config Tools (3)

generate_config - Generate scraping configuration
list_configs - List available preset configs
validate_config - Validate config JSON structure

Scraping Tools (8)

estimate_pages - Estimate page count before scraping
scrape_docs - Scrape documentation websites
scrape_github - Scrape GitHub repositories
scrape_pdf - Extract from PDF files
scrape_codebase - Analyze local codebases
detect_patterns - Detect design patterns in code
extract_test_examples - Extract usage examples from tests
build_how_to_guides - Build how-to guides from code

Packaging Tools (4)

package_skill - Package skill for target platform
upload_skill - Upload to LLM platform
enhance_skill - AI-powered enhancement
install_skill - One-command complete workflow

Source Tools (5)

fetch_config - Fetch config from remote source
submit_config - Submit config for approval
add_config_source - Add Git config source
list_config_sources - List config sources
remove_config_source - Remove config source

Splitting Tools (2)

split_config - Split large configs
generate_router - Generate router skills

Vector DB Tools (4)

export_to_weaviate - Export to Weaviate
export_to_chroma - Export to ChromaDB
export_to_faiss - Export to FAISS
export_to_qdrant - Export to Qdrant

Added - Cloud Storage

Upload skills directly to cloud storage:

AWS S3 - skill-seekers cloud upload --provider s3 --bucket my-bucket
Google Cloud Storage - skill-seekers cloud upload --provider gcs --bucket my-bucket
Azure Blob Storage - skill-seekers cloud upload --provider azure --container my-container

Features:

Upload/download directories
List files with metadata
Check file existence
Generate presigned URLs
Cloud-agnostic interface

Added - CI/CD Support

GitHub Action

- uses: skill-seekers/action@v1
  with:
    config: configs/react.json
    format: langchain

Features:

Auto-update on doc changes
Matrix builds for multiple frameworks
Scheduled updates
Caching for faster runs

Docker

docker run -v $(pwd):/data skill-seekers:latest scrape --config /data/config.json

Added - Production Infrastructure

Helm Charts - Kubernetes deployment
Docker Compose - Local vector DB stack
Monitoring - Sentry integration, sync monitoring
Benchmarking - Performance testing framework

Added - 12 Example Projects

Complete working examples for every integration:

langchain-rag-pipeline - React docs → LangChain → Chroma
llama-index-query-engine - Vue docs → LlamaIndex
pinecone-upsert - Documentation → Pinecone
chroma-example - Full ChromaDB workflow
faiss-example - FAISS index building
haystack-pipeline - Haystack RAG pipeline
qdrant-example - Qdrant vector DB
weaviate-example - Weaviate integration
cursor-react-skill - React skill for Cursor
windsurf-fastapi-context - FastAPI for Windsurf
cline-django-assistant - Django assistant for Cline
continue-dev-universal - Universal IDE context

Quality Metrics

✅ 1,852 tests across 100 test files
✅ 58,512 lines of Python code
✅ 80+ documentation files
✅ 100% test coverage for critical paths
✅ CI/CD on every commit

Fixed

URL Conversion Bug with Anchor Fragments (Issue #277)

Critical Bug Fix: Fixed 404 errors when scraping documentation with anchor links
- Problem: URLs with anchor fragments (e.g., #synchronous-initialization) were malformed
  - Incorrect: https://example.com/docs/api#method/index.html.md ❌
  - Correct: https://example.com/docs/api/index.html.md ✅
- Root Cause: _convert_to_md_urls() didn't strip anchor fragments before appending /index.html.md
- Solution: Parse URLs with urllib.parse to remove fragments and deduplicate base URLs
- Impact: Prevents duplicate requests for the same page with different anchors
- Additional Fix: Changed .md detection from ".md" in url to url.endswith('.md')
  - Prevents false matches on URLs like /cmd-line or /AMD-processors
Test Coverage: 12 comprehensive tests covering all edge cases
- Anchor fragment stripping
- Deduplication of multiple anchors on same URL
- Query parameter preservation
- Trailing slash handling
- Real-world MikroORM case validation
- 54/54 tests passing (42 existing + 12 new)
Reported by: @devjones via Issue #277

Added

Extended Language Detection (NEW)

7 New Programming Languages: Dart, Scala, SCSS, SASS, Elixir, Lua, Perl
- Pattern-based detection with confidence scoring (0.6-0.8+ thresholds)
- 70 regex patterns prioritizing unique identifiers (weight 5)
- Framework-specific patterns:
  - Dart: Flutter widgets (StatelessWidget, StatefulWidget, Widget build())
  - Scala: Pattern matching (case class, trait, match {})
  - SCSS: Preprocessor features ($variables, @mixin, @include, @extend)
  - SASS: Indented syntax (=mixin, +include, $variables)
  - Elixir: Functional patterns (defmodule, def ... do, pipe operator |>)
  - Lua: Game scripting (local, repeat...until, ~=, elseif)
  - Perl: Text processing (my $, use strict, sub, chomp, regex =~)
- Comprehensive test coverage: 7 new tests, 30/30 passing (100%)
- False positive prevention: Unique identifiers (weight 5) + confidence thresholds
- No regressions: All existing language detection tests still pass
- Total language support: Now 27+ programming languages
- Credit: Contributed by @PaawanBarach via PR #275

Multi-Agent Support for Local Enhancement (NEW)

Multiple Coding Agent Support: Choose your preferred local coding agent for SKILL.md enhancement
- Claude Code (default): Claude Code CLI with --dangerously-skip-permissions
- Codex CLI: OpenAI Codex CLI with --full-auto and --skip-git-repo-check
- Copilot CLI: GitHub Copilot CLI (gh copilot chat)
- OpenCode CLI: OpenCode CLI
- Custom agents: Use any CLI tool with --agent custom --agent-cmd "command {prompt_file}"
CLI Arguments: New flags for agent selection
- --agent: Choose agent (claude, codex, copilot, opencode, custom)
- --agent-cmd: Override command template for custom agents
Environment Variables: CI/CD friendly configuration
- SKILL_SEEKER_AGENT: Default agent to use
- SKILL_SEEKER_AGENT_CMD: Default command template for custom agents
Security First: Custom command validation
- Blocks dangerous shell characters (;, &, |, $, `, \n, \r)
- Validates executable exists in PATH
- Safe parsing with shlex.split()
Dual Input Modes: Supports both file-based and stdin-based agents
- File-based: Uses {prompt_file} placeholder (Claude, custom agents)
- Stdin-based: Pipes prompt via stdin (Codex CLI)
Backward Compatible: Claude Code remains the default, no breaking changes
Comprehensive Tests: 13 new tests covering all agent types and security validation
Agent Normalization: Smart alias handling (e.g., "claude-code" → "claude")
Credit: Contributed by @rovo79 (Robert Dean) via PR #270

C3.10: Signal Flow Analysis for Godot Projects (NEW)

Complete Signal Flow Analysis System: Analyze event-driven architectures in Godot game projects
- Signal declaration extraction (signal keyword detection)
- Connection mapping (.connect() calls with targets and methods)
- Emission tracking (.emit() and emit_signal() calls)
- 208 signals, 634 connections, and 298 emissions detected in test project (Cosmic Idler)
- Signal density metrics (signals per file)
- Event chain detection (signals triggering other signals)
- Output: signal_flow.json, signal_flow.mmd (Mermaid diagram), signal_reference.md
Signal Pattern Detection: Three major patterns identified
- EventBus Pattern (0.90 confidence): Centralized signal hub in autoload
- Observer Pattern (0.85 confidence): Multi-observer signals (3+ listeners)
- Event Chains (0.80 confidence): Cascading signal propagation
Signal-Based How-To Guides (C3.10.1): AI-generated usage guides
- Step-by-step guides (Connect → Emit → Handle)
- Real code examples from project
- Common usage locations
- Parameter documentation
- Output: signal_how_to_guides.md (10 guides for Cosmic Idler)

Godot Game Engine Support

Comprehensive Godot File Type Support: Full analysis of Godot 4.x projects
- GDScript (.gd): 265 files analyzed in test project
- Scene files (.tscn): 118 scene files
- Resource files (.tres): 38 resource files
- Shader files (.gdshader, .gdshaderinc): 9 shader files
- C# integration: Phantom Camera addon (13 files)
GDScript Language Support: Complete GDScript parsing with regex-based extraction
- Dependency extraction: preload(), load(), extends patterns
- Test framework detection: GUT, gdUnit4, WAT
- Test file patterns: test_*.gd, *_test.gd
- Signal syntax: signal, .connect(), .emit()
- Export decorators: @export, @onready
- Test decorators: @test (gdUnit4)
Game Engine Framework Detection: Improved detection for Unity, Unreal, Godot
- Godot markers: project.godot, .godot directory, .tscn, .tres, .gd files
- Unity markers: Assembly-CSharp.csproj, UnityEngine.dll, ProjectSettings/ProjectVersion.txt
- Unreal markers: .uproject, Source/, Config/DefaultEngine.ini
- Fixed false positive Unity detection (was using generic "Assets" keyword)
GDScript Test Extraction: Extract usage examples from Godot test files
- 396 test cases extracted from 20 GUT test files in test project
- Patterns: instantiation (preload().new(), load().new()), assertions (assert_eq, assert_true), signals
- GUT framework: extends GutTest, func test_*(), add_child_autofree()
- Test categories: instantiation, assertions, signal connections, setup/teardown
- Real code examples from production test files

C3.9: Project Documentation Extraction

Markdown Documentation Extraction: Automatically extracts and categorizes all .md files from projects
- Smart categorization by folder/filename (overview, architecture, guides, workflows, features, etc.)
- Processing depth control: surface (raw copy), deep (parse+summarize), full (AI-enhanced)
- AI enhancement (level 2+) adds topic extraction and cross-references
- New "📖 Project Documentation" section in SKILL.md
- Output to references/documentation/ organized by category
- Default ON, use --skip-docs to disable
- 15 new tests for documentation extraction features

Granular AI Enhancement Control

--enhance-level Flag: Fine-grained control over AI enhancement (0-3)
- Level 0: No AI enhancement (default)
- Level 1: SKILL.md enhancement only (fast, high value)
- Level 2: SKILL.md + Architecture + Config + Documentation
- Level 3: Full enhancement (patterns, tests, config, architecture, docs)
Config Integration: default_enhance_level setting in ~/.config/skill-seekers/config.json
MCP Support: All MCP tools updated with enhance_level parameter
Independent from --comprehensive: Enhancement level is separate from feature depth

C# Language Support

C# Test Example Extraction: Full support for C# test frameworks
- Language alias mapping (C# → csharp, C++ → cpp)
- NUnit, xUnit, MSTest test framework patterns
- Mock pattern support (NSubstitute, Moq)
- Zenject dependency injection patterns
- Setup/teardown method extraction
- 2 new tests for C# extraction features

Performance Optimizations

Parallel LOCAL Mode AI Enhancement: 6-12x faster with ThreadPoolExecutor
- Concurrent workers: 3 (configurable via local_parallel_workers)
- Batch processing: 20 patterns per Claude CLI call (configurable via local_batch_size)
- Significant speedup for large codebases
Config Settings: New ai_enhancement section in config
- local_batch_size: Patterns per CLI call (default: 20)
- local_parallel_workers: Concurrent workers (default: 3)

UX Improvements

Auto-Enhancement: SKILL.md automatically enhanced when using --enhance or --comprehensive
- No need for separate skill-seekers enhance command
- Seamless one-command workflow
- 10-minute timeout for large codebases
- Graceful fallback with retry instructions on failure
LOCAL Mode Fallback: All AI enhancements now fall back to LOCAL mode when no API key is set
- Applies to: pattern enhancement (C3.1), test examples (C3.2), architecture (C3.7)
- Uses Claude Code CLI instead of failing silently
- Better UX: "Using LOCAL mode (Claude Code CLI)" instead of "AI disabled"
Support for custom Claude-compatible API endpoints via ANTHROPIC_BASE_URL environment variable
Compatibility with GLM-4.7 and other Claude-compatible APIs across all AI enhancement features

Changed

All AI enhancement modules now respect ANTHROPIC_BASE_URL for custom endpoints
Updated documentation with GLM-4.7 configuration examples
Rewritten LOCAL mode in config_enhancer.py to use Claude CLI properly with explicit output file paths
Updated MCP scrape_codebase_tool with skip_docs and enhance_level parameters
Updated CLAUDE.md with C3.9 documentation extraction feature
Increased default batch size from 5 to 20 patterns for LOCAL mode

Fixed

C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
Config Type Field Mismatch: Fixed KeyError in config_enhancer.py by supporting both "type" and "config_type" fields
LocalSkillEnhancer Import: Fixed incorrect import and method call in main.py (SkillEnhancer → LocalSkillEnhancer)
Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)

Godot Game Engine Fixes

GDScript Dependency Extraction: Fixed 265+ "Syntax error in *.gd" warnings (commit 3e6c448)
- GDScript files were incorrectly routed to Python AST parser
- Created dedicated _extract_gdscript_imports() with regex patterns
- Now correctly parses preload(), load(), extends patterns
- Result: 377 dependencies extracted with 0 warnings
Framework Detection False Positive: Fixed Unity detection on Godot projects (commit 50b28fe)
- Was detecting "Unity" due to generic "Assets" keyword in comments
- Changed Unity markers to specific files: Assembly-CSharp.csproj, UnityEngine.dll, Library/
- Now correctly detects Godot via project.godot, .godot directory
Circular Dependencies: Fixed self-referential cycles (commit 50b28fe)
- 3 self-loop warnings (files depending on themselves)
- Added target != file_path check in dependency graph builder
- Result: 0 circular dependencies detected
GDScript Test Discovery: Fixed 0 test files found in Godot projects (commit 50b28fe)
- Added GDScript test patterns: test_*.gd, *_test.gd
- Added GDScript to LANGUAGE_MAP
- Result: 32 test files discovered (20 GUT files with 396 tests)
GDScript Test Extraction: Fixed "Language GDScript not supported" warning (commit c826690)
- Added GDScript regex patterns to PATTERNS dictionary
- Patterns: instantiation (preload().new()), assertions (assert_eq), signals (.connect())
- Result: 22 test examples extracted successfully
Config Extractor Array Handling: Fixed JSON/YAML array parsing (commit fca0951)
- Error: 'list' object has no attribute 'items' on root-level arrays
- Added isinstance checks for dict/list/primitive at root
- Result: No JSON array errors, save.json parsed correctly
Progress Indicators: Fixed missing progress for small batches (commit eec37f5)
- Progress only shown every 5 batches, invisible for small jobs
- Modified condition to always show for batches < 10
- Result: "Progress: 1/2 batches completed" now visible

Other Fixes

C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
Config Type Field Mismatch: Fixed KeyError in config_enhancer.py by supporting both "type" and "config_type" fields
LocalSkillEnhancer Import: Fixed incorrect import and method call in main.py (SkillEnhancer → LocalSkillEnhancer)
Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)

Tests

GDScript Test Extraction Test: Added comprehensive test case for GDScript GUT/gdUnit4 framework
- Tests player instantiation with preload() and load()
- Tests signal connections and emissions
- Tests gdUnit4 @test annotation syntax
- Tests game state management patterns
- 4 test functions with 60+ lines of GDScript code
- Validates extraction of instantiations, assertions, and signal patterns

Removed

Removed client-specific documentation files from repository