github yusufkaraaslan/Skill_Seekers v3.2.0
v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor

8 hours ago

v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor

Theme: Video source support, Word document support, Pinecone adaptor, and quality improvements. 94 files changed, +23,500 lines since v3.1.3. 2,540 tests passing.

🎬 Video Extraction Pipeline

Complete video extraction system that converts YouTube videos and local video files into AI-consumable skills.

  • skill-seekers video --url <youtube-url> — New CLI command for video scraping
  • skill-seekers create <youtube-url> — Auto-detects YouTube URLs
  • Transcript extraction — 3-tier fallback: YouTube API → yt-dlp → faster-whisper
  • Visual OCR — Multi-engine ensemble (EasyOCR + pytesseract) for code frames
  • Panel detection — Splits IDE screenshots into independent sub-sections
  • Code timeline — Tracks code evolution across frames with edit history
  • Two-pass AI enhancement — Cleans OCR noise using transcript context
  • GPU auto-detectionskill-seekers video --setup detects CUDA/ROCm/CPU and installs correct PyTorch
  • 197 tests covering models, metadata, transcript, visual, OCR, and CLI

📄 Word Document (.docx) Support

  • skill-seekers word --docx <file> — Full pipeline: mammoth → HTML → sections → SKILL.md
  • skill-seekers create document.docx — Auto-detects .docx files
  • Smart code detection — Identifies monospace paragraphs as code blocks
  • Install: pip install skill-seekers[docx]

🌲 Pinecone Vector Database Adaptor

  • skill-seekers package output/ --format pinecone --upload — Direct Pinecone upload
  • Full CRUD operations with namespace support
  • OpenAI and Sentence Transformers embedding support
  • Batch upsert with configurable batch sizes
  • 764 tests for comprehensive coverage

🐛 Bug Fixes

  • 6 OCR quality fixes — Skip webcam frames, clean IDE decorations, fix duplicate lines, filter UI junk
  • 15 video pipeline fixes — Timeout handling, MCP integration, filename collisions, dependency management
  • Issue #300 — Selector fallback & dry-run link discovery (ReactFlow found 20+ pages, was 1)
  • Issue #301setup.sh macOS fix
  • RAG chunking crash — Fixed AttributeError: output_dir
  • Chunk overlap auto-scaling — Scales to max(50, chunk_tokens // 10)
  • Reference file limits removed — No more caps on GitHub issues, releases, or code blocks
  • See CHANGELOG.md for full details

📦 Install / Upgrade

pip install --upgrade skill-seekers

# With video support
pip install skill-seekers[video]
skill-seekers video --setup  # Auto-detect GPU, install deps

# With Word support
pip install skill-seekers[docx]

# With Pinecone
pip install skill-seekers[pinecone]

# Everything
pip install skill-seekers[all]

Full Changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md

Don't miss a new Skill_Seekers release

NewReleases is sending notifications on new releases.