[3.0.0] - 2026-02-10
🚀 "Universal Intelligence Platform" - Major Release
Theme: Transform any documentation into structured knowledge for any AI system.
This is our biggest release ever! v3.0.0 establishes Skill Seekers as the universal documentation preprocessor for the entire AI ecosystem - from RAG pipelines to AI coding assistants to Claude skills.
Highlights
- 🚀 16 platform adaptors (up from 4 in v2.x)
- 🛠️ 26 MCP tools (up from 9)
- ✅ 1,852 tests passing (up from 700+)
- ☁️ Cloud storage support (S3, GCS, Azure)
- 🔄 CI/CD ready (GitHub Action + Docker)
- 📦 12 example projects for every integration
- 📚 18 integration guides complete
Added - Platform Adaptors (16 Total)
RAG & Vector Databases (8)
- LangChain (
--format langchain) - Output LangChain Document objects - LlamaIndex (
--format llama-index) - Output LlamaIndex TextNode objects - Chroma (
--format chroma) - Direct ChromaDB integration - FAISS (
--format faiss) - Facebook AI Similarity Search - Haystack (
--format haystack) - Deepset Haystack pipelines - Qdrant (
--format qdrant) - Qdrant vector database - Weaviate (
--format weaviate) - Weaviate vector search - Pinecone-ready (
--target markdown) - Markdown format ready for Pinecone
AI Platforms (3)
- Claude (
--target claude) - Claude AI skills (ZIP + YAML) - Gemini (
--target gemini) - Google Gemini skills (tar.gz) - OpenAI (
--target openai) - OpenAI ChatGPT (ZIP + Vector Store)
AI Coding Assistants (4)
- Cursor (
--target claude+.cursorrules) - Cursor IDE integration - Windsurf (
--target claude+.windsurfrules) - Windsurf/Codeium - Cline (
--target claude+.clinerules) - VS Code extension - Continue.dev (
--target claude) - Universal IDE support
Generic (1)
- Markdown (
--target markdown) - Generic ZIP export
Added - MCP Tools (26 Total)
Config Tools (3)
generate_config- Generate scraping configurationlist_configs- List available preset configsvalidate_config- Validate config JSON structure
Scraping Tools (8)
estimate_pages- Estimate page count before scrapingscrape_docs- Scrape documentation websitesscrape_github- Scrape GitHub repositoriesscrape_pdf- Extract from PDF filesscrape_codebase- Analyze local codebasesdetect_patterns- Detect design patterns in codeextract_test_examples- Extract usage examples from testsbuild_how_to_guides- Build how-to guides from code
Packaging Tools (4)
package_skill- Package skill for target platformupload_skill- Upload to LLM platformenhance_skill- AI-powered enhancementinstall_skill- One-command complete workflow
Source Tools (5)
fetch_config- Fetch config from remote sourcesubmit_config- Submit config for approvaladd_config_source- Add Git config sourcelist_config_sources- List config sourcesremove_config_source- Remove config source
Splitting Tools (2)
split_config- Split large configsgenerate_router- Generate router skills
Vector DB Tools (4)
export_to_weaviate- Export to Weaviateexport_to_chroma- Export to ChromaDBexport_to_faiss- Export to FAISSexport_to_qdrant- Export to Qdrant
Added - Cloud Storage
Upload skills directly to cloud storage:
- AWS S3 -
skill-seekers cloud upload --provider s3 --bucket my-bucket - Google Cloud Storage -
skill-seekers cloud upload --provider gcs --bucket my-bucket - Azure Blob Storage -
skill-seekers cloud upload --provider azure --container my-container
Features:
- Upload/download directories
- List files with metadata
- Check file existence
- Generate presigned URLs
- Cloud-agnostic interface
Added - CI/CD Support
GitHub Action
- uses: skill-seekers/action@v1
with:
config: configs/react.json
format: langchainFeatures:
- Auto-update on doc changes
- Matrix builds for multiple frameworks
- Scheduled updates
- Caching for faster runs
Docker
docker run -v $(pwd):/data skill-seekers:latest scrape --config /data/config.jsonAdded - Production Infrastructure
- Helm Charts - Kubernetes deployment
- Docker Compose - Local vector DB stack
- Monitoring - Sentry integration, sync monitoring
- Benchmarking - Performance testing framework
Added - 12 Example Projects
Complete working examples for every integration:
- langchain-rag-pipeline - React docs → LangChain → Chroma
- llama-index-query-engine - Vue docs → LlamaIndex
- pinecone-upsert - Documentation → Pinecone
- chroma-example - Full ChromaDB workflow
- faiss-example - FAISS index building
- haystack-pipeline - Haystack RAG pipeline
- qdrant-example - Qdrant vector DB
- weaviate-example - Weaviate integration
- cursor-react-skill - React skill for Cursor
- windsurf-fastapi-context - FastAPI for Windsurf
- cline-django-assistant - Django assistant for Cline
- continue-dev-universal - Universal IDE context
Quality Metrics
- ✅ 1,852 tests across 100 test files
- ✅ 58,512 lines of Python code
- ✅ 80+ documentation files
- ✅ 100% test coverage for critical paths
- ✅ CI/CD on every commit
Fixed
URL Conversion Bug with Anchor Fragments (Issue #277)
- Critical Bug Fix: Fixed 404 errors when scraping documentation with anchor links
- Problem: URLs with anchor fragments (e.g.,
#synchronous-initialization) were malformed- Incorrect:
https://example.com/docs/api#method/index.html.md❌ - Correct:
https://example.com/docs/api/index.html.md✅
- Incorrect:
- Root Cause:
_convert_to_md_urls()didn't strip anchor fragments before appending/index.html.md - Solution: Parse URLs with
urllib.parseto remove fragments and deduplicate base URLs - Impact: Prevents duplicate requests for the same page with different anchors
- Additional Fix: Changed
.mddetection from".md" in urltourl.endswith('.md')- Prevents false matches on URLs like
/cmd-lineor/AMD-processors
- Prevents false matches on URLs like
- Problem: URLs with anchor fragments (e.g.,
- Test Coverage: 12 comprehensive tests covering all edge cases
- Anchor fragment stripping
- Deduplication of multiple anchors on same URL
- Query parameter preservation
- Trailing slash handling
- Real-world MikroORM case validation
- 54/54 tests passing (42 existing + 12 new)
- Reported by: @devjones via Issue #277
Added
Extended Language Detection (NEW)
- 7 New Programming Languages: Dart, Scala, SCSS, SASS, Elixir, Lua, Perl
- Pattern-based detection with confidence scoring (0.6-0.8+ thresholds)
- 70 regex patterns prioritizing unique identifiers (weight 5)
- Framework-specific patterns:
- Dart: Flutter widgets (
StatelessWidget,StatefulWidget,Widget build()) - Scala: Pattern matching (
case class,trait,match {}) - SCSS: Preprocessor features (
$variables,@mixin,@include,@extend) - SASS: Indented syntax (
=mixin,+include,$variables) - Elixir: Functional patterns (
defmodule,def ... do, pipe operator|>) - Lua: Game scripting (
local,repeat...until,~=,elseif) - Perl: Text processing (
my $,use strict,sub,chomp, regex=~)
- Dart: Flutter widgets (
- Comprehensive test coverage: 7 new tests, 30/30 passing (100%)
- False positive prevention: Unique identifiers (weight 5) + confidence thresholds
- No regressions: All existing language detection tests still pass
- Total language support: Now 27+ programming languages
- Credit: Contributed by @PaawanBarach via PR #275
Multi-Agent Support for Local Enhancement (NEW)
- Multiple Coding Agent Support: Choose your preferred local coding agent for SKILL.md enhancement
- Claude Code (default): Claude Code CLI with
--dangerously-skip-permissions - Codex CLI: OpenAI Codex CLI with
--full-autoand--skip-git-repo-check - Copilot CLI: GitHub Copilot CLI (
gh copilot chat) - OpenCode CLI: OpenCode CLI
- Custom agents: Use any CLI tool with
--agent custom --agent-cmd "command {prompt_file}"
- Claude Code (default): Claude Code CLI with
- CLI Arguments: New flags for agent selection
--agent: Choose agent (claude, codex, copilot, opencode, custom)--agent-cmd: Override command template for custom agents
- Environment Variables: CI/CD friendly configuration
SKILL_SEEKER_AGENT: Default agent to useSKILL_SEEKER_AGENT_CMD: Default command template for custom agents
- Security First: Custom command validation
- Blocks dangerous shell characters (
;,&,|,$,`,\n,\r) - Validates executable exists in PATH
- Safe parsing with
shlex.split()
- Blocks dangerous shell characters (
- Dual Input Modes: Supports both file-based and stdin-based agents
- File-based: Uses
{prompt_file}placeholder (Claude, custom agents) - Stdin-based: Pipes prompt via stdin (Codex CLI)
- File-based: Uses
- Backward Compatible: Claude Code remains the default, no breaking changes
- Comprehensive Tests: 13 new tests covering all agent types and security validation
- Agent Normalization: Smart alias handling (e.g., "claude-code" → "claude")
- Credit: Contributed by @rovo79 (Robert Dean) via PR #270
C3.10: Signal Flow Analysis for Godot Projects (NEW)
-
Complete Signal Flow Analysis System: Analyze event-driven architectures in Godot game projects
- Signal declaration extraction (
signalkeyword detection) - Connection mapping (
.connect()calls with targets and methods) - Emission tracking (
.emit()andemit_signal()calls) - 208 signals, 634 connections, and 298 emissions detected in test project (Cosmic Idler)
- Signal density metrics (signals per file)
- Event chain detection (signals triggering other signals)
- Output:
signal_flow.json,signal_flow.mmd(Mermaid diagram),signal_reference.md
- Signal declaration extraction (
-
Signal Pattern Detection: Three major patterns identified
- EventBus Pattern (0.90 confidence): Centralized signal hub in autoload
- Observer Pattern (0.85 confidence): Multi-observer signals (3+ listeners)
- Event Chains (0.80 confidence): Cascading signal propagation
-
Signal-Based How-To Guides (C3.10.1): AI-generated usage guides
- Step-by-step guides (Connect → Emit → Handle)
- Real code examples from project
- Common usage locations
- Parameter documentation
- Output:
signal_how_to_guides.md(10 guides for Cosmic Idler)
Godot Game Engine Support
-
Comprehensive Godot File Type Support: Full analysis of Godot 4.x projects
- GDScript (.gd): 265 files analyzed in test project
- Scene files (.tscn): 118 scene files
- Resource files (.tres): 38 resource files
- Shader files (.gdshader, .gdshaderinc): 9 shader files
- C# integration: Phantom Camera addon (13 files)
-
GDScript Language Support: Complete GDScript parsing with regex-based extraction
- Dependency extraction:
preload(),load(),extendspatterns - Test framework detection: GUT, gdUnit4, WAT
- Test file patterns:
test_*.gd,*_test.gd - Signal syntax:
signal,.connect(),.emit() - Export decorators:
@export,@onready - Test decorators:
@test(gdUnit4)
- Dependency extraction:
-
Game Engine Framework Detection: Improved detection for Unity, Unreal, Godot
- Godot markers:
project.godot,.godotdirectory,.tscn,.tres,.gdfiles - Unity markers:
Assembly-CSharp.csproj,UnityEngine.dll,ProjectSettings/ProjectVersion.txt - Unreal markers:
.uproject,Source/,Config/DefaultEngine.ini - Fixed false positive Unity detection (was using generic "Assets" keyword)
- Godot markers:
-
GDScript Test Extraction: Extract usage examples from Godot test files
- 396 test cases extracted from 20 GUT test files in test project
- Patterns: instantiation (
preload().new(),load().new()), assertions (assert_eq,assert_true), signals - GUT framework:
extends GutTest,func test_*(),add_child_autofree() - Test categories: instantiation, assertions, signal connections, setup/teardown
- Real code examples from production test files
C3.9: Project Documentation Extraction
- Markdown Documentation Extraction: Automatically extracts and categorizes all
.mdfiles from projects- Smart categorization by folder/filename (overview, architecture, guides, workflows, features, etc.)
- Processing depth control:
surface(raw copy),deep(parse+summarize),full(AI-enhanced) - AI enhancement (level 2+) adds topic extraction and cross-references
- New "📖 Project Documentation" section in SKILL.md
- Output to
references/documentation/organized by category - Default ON, use
--skip-docsto disable - 15 new tests for documentation extraction features
Granular AI Enhancement Control
--enhance-levelFlag: Fine-grained control over AI enhancement (0-3)- Level 0: No AI enhancement (default)
- Level 1: SKILL.md enhancement only (fast, high value)
- Level 2: SKILL.md + Architecture + Config + Documentation
- Level 3: Full enhancement (patterns, tests, config, architecture, docs)
- Config Integration:
default_enhance_levelsetting in~/.config/skill-seekers/config.json - MCP Support: All MCP tools updated with
enhance_levelparameter - Independent from
--comprehensive: Enhancement level is separate from feature depth
C# Language Support
- C# Test Example Extraction: Full support for C# test frameworks
- Language alias mapping (C# → csharp, C++ → cpp)
- NUnit, xUnit, MSTest test framework patterns
- Mock pattern support (NSubstitute, Moq)
- Zenject dependency injection patterns
- Setup/teardown method extraction
- 2 new tests for C# extraction features
Performance Optimizations
- Parallel LOCAL Mode AI Enhancement: 6-12x faster with ThreadPoolExecutor
- Concurrent workers: 3 (configurable via
local_parallel_workers) - Batch processing: 20 patterns per Claude CLI call (configurable via
local_batch_size) - Significant speedup for large codebases
- Concurrent workers: 3 (configurable via
- Config Settings: New
ai_enhancementsection in configlocal_batch_size: Patterns per CLI call (default: 20)local_parallel_workers: Concurrent workers (default: 3)
UX Improvements
-
Auto-Enhancement: SKILL.md automatically enhanced when using
--enhanceor--comprehensive- No need for separate
skill-seekers enhancecommand - Seamless one-command workflow
- 10-minute timeout for large codebases
- Graceful fallback with retry instructions on failure
- No need for separate
-
LOCAL Mode Fallback: All AI enhancements now fall back to LOCAL mode when no API key is set
- Applies to: pattern enhancement (C3.1), test examples (C3.2), architecture (C3.7)
- Uses Claude Code CLI instead of failing silently
- Better UX: "Using LOCAL mode (Claude Code CLI)" instead of "AI disabled"
-
Support for custom Claude-compatible API endpoints via
ANTHROPIC_BASE_URLenvironment variable -
Compatibility with GLM-4.7 and other Claude-compatible APIs across all AI enhancement features
Changed
- All AI enhancement modules now respect
ANTHROPIC_BASE_URLfor custom endpoints - Updated documentation with GLM-4.7 configuration examples
- Rewritten LOCAL mode in
config_enhancer.pyto use Claude CLI properly with explicit output file paths - Updated MCP
scrape_codebase_toolwithskip_docsandenhance_levelparameters - Updated CLAUDE.md with C3.9 documentation extraction feature
- Increased default batch size from 5 to 20 patterns for LOCAL mode
Fixed
- C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
- Config Type Field Mismatch: Fixed KeyError in
config_enhancer.pyby supporting both "type" and "config_type" fields - LocalSkillEnhancer Import: Fixed incorrect import and method call in
main.py(SkillEnhancer → LocalSkillEnhancer) - Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)
Godot Game Engine Fixes
-
GDScript Dependency Extraction: Fixed 265+ "Syntax error in *.gd" warnings (commit 3e6c448)
- GDScript files were incorrectly routed to Python AST parser
- Created dedicated
_extract_gdscript_imports()with regex patterns - Now correctly parses
preload(),load(),extendspatterns - Result: 377 dependencies extracted with 0 warnings
-
Framework Detection False Positive: Fixed Unity detection on Godot projects (commit 50b28fe)
- Was detecting "Unity" due to generic "Assets" keyword in comments
- Changed Unity markers to specific files:
Assembly-CSharp.csproj,UnityEngine.dll,Library/ - Now correctly detects Godot via
project.godot,.godotdirectory
-
Circular Dependencies: Fixed self-referential cycles (commit 50b28fe)
- 3 self-loop warnings (files depending on themselves)
- Added
target != file_pathcheck in dependency graph builder - Result: 0 circular dependencies detected
-
GDScript Test Discovery: Fixed 0 test files found in Godot projects (commit 50b28fe)
- Added GDScript test patterns:
test_*.gd,*_test.gd - Added GDScript to LANGUAGE_MAP
- Result: 32 test files discovered (20 GUT files with 396 tests)
- Added GDScript test patterns:
-
GDScript Test Extraction: Fixed "Language GDScript not supported" warning (commit c826690)
- Added GDScript regex patterns to PATTERNS dictionary
- Patterns: instantiation (
preload().new()), assertions (assert_eq), signals (.connect()) - Result: 22 test examples extracted successfully
-
Config Extractor Array Handling: Fixed JSON/YAML array parsing (commit fca0951)
- Error:
'list' object has no attribute 'items'on root-level arrays - Added isinstance checks for dict/list/primitive at root
- Result: No JSON array errors, save.json parsed correctly
- Error:
-
Progress Indicators: Fixed missing progress for small batches (commit eec37f5)
- Progress only shown every 5 batches, invisible for small jobs
- Modified condition to always show for batches < 10
- Result: "Progress: 1/2 batches completed" now visible
Other Fixes
- C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
- Config Type Field Mismatch: Fixed KeyError in
config_enhancer.pyby supporting both "type" and "config_type" fields - LocalSkillEnhancer Import: Fixed incorrect import and method call in
main.py(SkillEnhancer → LocalSkillEnhancer) - Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)
Tests
- GDScript Test Extraction Test: Added comprehensive test case for GDScript GUT/gdUnit4 framework
- Tests player instantiation with
preload()andload() - Tests signal connections and emissions
- Tests gdUnit4
@testannotation syntax - Tests game state management patterns
- 4 test functions with 60+ lines of GDScript code
- Validates extraction of instantiations, assertions, and signal patterns
- Tests player instantiation with
Removed
- Removed client-specific documentation files from repository