VOOZH about

URL: https://pypi.org/project/skill-seekers/

⇱ skill-seekers · PyPI


Skip to main content

skill-seekers 3.8.0

pip install skill-seekers

Latest release

Released:

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills. International support with Chinese (简体中文) documentation.

Navigation

Verified details

These details have been verified by PyPI
Maintainers
👁 Avatar for yusyus from gravatar.com
yusyus

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: MIT License (MIT)
  • Author: Yusuf Karaaslan
  • Tags claude , ai , documentation , scraping , skills , llm , mcp , automation , i18n , chinese , international
  • Requires: Python >=3.10
  • Provides-Extra: mcp , gemini , openai , minimax , kimi , deepseek , qwen , openrouter , together , fireworks , all-llms , s3 , gcs , azure , docx , epub , video , video-full , chroma , weaviate , sentence-transformers , pinecone , rag-upload , all-cloud , jupyter , asciidoc , pptx , confluence , notion , rss , chat , browser , embedding , all

Project description

👁 Skill Seekers

Skill Seekers

English | 简体中文 | 日本語 | 한국어 | Español | Français | Deutsch | Português | Türkçe | العربية | हिन्दी | Русский

👁 Version
👁 License: MIT
👁 Python 3.10+
👁 MCP Integration
👁 Tested
👁 Project Board
👁 PyPI version
👁 PyPI - Downloads
👁 PyPI - Python Version
👁 Website
👁 Twitter Follow
👁 GitHub Repo stars
👁 PyPI Downloads

👁 yusufkaraaslan%2FSkill_Seekers | Trendshift

🧠 The data layer for AI systems. Skill Seekers turns documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and 10+ more source types into structured knowledge assets—ready to power AI Skills (Claude, Gemini, OpenAI), RAG pipelines (LangChain, LlamaIndex, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline) in minutes, not hours.

🌐 Visit SkillSeekersWeb.com - Browse 24+ preset configs, share your configs, and access complete documentation!

📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!

🌐 Ecosystem

Skill Seekers is a multi-repo project. Here's where everything lives:

Repository Description Links
Skill_Seekers Core CLI & MCP server (this repo) PyPI
skillseekersweb Website & documentation Live
skill-seekers-configs Community config repository
skill-seekers-action GitHub Action for CI/CD
skill-seekers-plugin Claude Code plugin
homebrew-skill-seekers Homebrew tap for macOS

Want to contribute? The website and configs repos are great starting points for new contributors!

🧠 The Data Layer for AI Systems

Skill Seekers is the universal preprocessing layer that sits between raw documentation and every AI system that consumes it. Whether you are building Claude skills, a LangChain RAG pipeline, or a Cursor .cursorrules file — the data preparation is identical. You do it once, and export to all targets.

# One command → structured knowledge asset
skill-seekerscreatehttps://docs.react.dev/
# or: skill-seekers create facebook/react
# or: skill-seekers create ./my-project

# Export to any AI system
skill-seekerspackageoutput/react--targetclaude# → Claude AI Skill (ZIP)
skill-seekerspackageoutput/react--targetlangchain# → LangChain Documents
skill-seekerspackageoutput/react--targetllama-index# → LlamaIndex TextNodes
skill-seekerspackageoutput/react--targetcursor# → .cursorrules
skill-seekerspackageoutput/react--targetibm-bob# → IBM Bob skill directory

What gets built

Output Target What it powers
Claude Skill (ZIP + YAML) --target claude Claude Code, Claude API
Gemini Skill (tar.gz) --target gemini Google Gemini
OpenAI / Custom GPT (ZIP) --target openai GPT-4o, custom assistants
LangChain Documents --target langchain QA chains, agents, retrievers
LlamaIndex TextNodes --target llama-index Query engines, chat engines
Haystack Documents --target haystack Enterprise RAG pipelines
Pinecone-ready (Markdown) --target markdown Vector upsert
ChromaDB / FAISS / Qdrant --target chroma/faiss/qdrant Local vector DBs
IBM Bob Skill (directory) --target ibm-bob IBM Bob project/global skills
Cursor .cursorrules --target markdown → copy SKILL.md Cursor IDE .cursorrules
Windsurf / Cline / Continue --target claude → copy VS Code, IntelliJ, Vim

Why it matters

  • 99% faster — Days of manual data prep → 15–45 minutes
  • 🎯 AI Skill quality — 500+ line SKILL.md files with examples, patterns, and guides
  • 📊 RAG-ready chunks — Smart chunking preserves code blocks and maintains context
  • 🎬 Videos — Extract code, transcripts, and structured knowledge from YouTube and local videos
  • 🔄 Multi-source — Combine 18 source types (docs, GitHub, PDFs, videos, notebooks, wikis, and more) into one knowledge asset
  • 🌐 One prep, every target — Export the same asset to 21 platforms without re-scraping
  • Battle-tested — 3,700+ tests, 24+ framework presets, production-ready

🚀 Quick Start (3 Commands)

# 1. Install
pipinstallskill-seekers

# 2. Create skill from any source
skill-seekerscreatehttps://docs.django.com/

# 3. Package for your AI platform
skill-seekerspackageoutput/django--targetclaude

That's it! You now have output/django-claude.zip ready to use.

# Use a different AI agent for enhancement (default: claude)
skill-seekerscreatehttps://docs.django.com/--agentkimi
skill-seekerscreatehttps://docs.django.com/--agentcodex
skill-seekerscreatehttps://docs.django.com/--agent-cmd"my-custom-agent run"

🛰️ AI-driven project scan (new)

Point scan at any project and an AI agent reads its manifests, README, Dockerfile/CI and sampled source imports — then emits one config per detected framework plus a <project>-codebase.json for your own code. Pins the detected version so re-running reports bumps:

skill-seekersscan./my-react-app--out./configs/scanned/
# → react.json, vite.json, tailwind.json, jest.json, my-react-app-codebase.json

# Then build any of them
skill-seekerscreate./configs/scanned/react.json

If a detection has no existing preset, the AI generates a fresh config; on exit you can optionally publish it back to the community registry.

Other Sources (18 Supported)

# GitHub repository
skill-seekerscreatefacebook/react

# Local project
skill-seekerscreate./my-project

# PDF document
skill-seekerscreatemanual.pdf

# Word document
skill-seekerscreatereport.docx

# EPUB e-book
skill-seekerscreatebook.epub

# Jupyter Notebook
skill-seekerscreatenotebook.ipynb

# OpenAPI spec
skill-seekerscreateopenapi.yaml

# PowerPoint presentation
skill-seekerscreatepresentation.pptx

# AsciiDoc document
skill-seekerscreateguide.adoc

# Local HTML file (auto-detected by extension)
skill-seekerscreatepage.html

# Whole directory of HTML files (auto-detected for HTML-dominant dirs)
skill-seekerscreate./mirror_output/site/

# Force HTML mode on a mixed/code-heavy directory
skill-seekerscreate./repo/--html-path./repo/docs/build/html/

# RSS/Atom feed
skill-seekerscreatefeed.rss

# Man page
skill-seekerscreatecurl.1

# Video (YouTube, Vimeo, or local file — requires skill-seekers[video])
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=...--namemytutorial
# First time? Auto-install GPU-aware visual deps:
skill-seekerscreate--setup

# Confluence wiki
skill-seekerscreate--space-keyTEAM--namewiki

# Notion pages
skill-seekerscreate--database-id...--namedocs

# Slack/Discord chat export
skill-seekerscreate--chat-export-path./slack-export--nameteam-chat

Export Everywhere

# Package for multiple platforms
forplatforminclaudegeminiopenailangchain;do
skill-seekerspackageoutput/django--target$platform
done

What is Skill Seekers?

Skill Seekers is the data layer for AI systems. It transforms 18 source types—documentation websites, GitHub repositories, PDFs, videos, Jupyter Notebooks, Word/EPUB/AsciiDoc documents, OpenAPI specs, PowerPoint presentations, RSS feeds, man pages, Confluence wikis, Notion pages, Slack/Discord exports, and more—into structured knowledge assets for every AI target:

Use Case What you get Examples
AI Skills Comprehensive SKILL.md + references Claude Code, Gemini, GPT
RAG Pipelines Chunked documents with rich metadata LangChain, LlamaIndex, Haystack
Vector Databases Pre-formatted data ready for upsert Pinecone, Chroma, Weaviate, FAISS
AI Coding Assistants Context files your IDE AI reads automatically Cursor, Windsurf, Cline, Continue.dev

📚 Documentation

I want to... Read this
Get started quickly Quick Start - 3 commands to first skill
Understand concepts Core Concepts - How it works
Scrape sources Scraping Guide - All source types
Enhance skills Enhancement Guide - AI enhancement
Export skills Packaging Guide - Platform export
Look up commands CLI Reference - All 20 commands
Configure Config Format - JSON specification
Fix issues Troubleshooting - Common problems

Complete documentation: docs/README.md

Instead of spending days on manual preprocessing, Skill Seekers:

  1. Ingests — docs, GitHub repos, local codebases, PDFs, videos, notebooks, wikis, and 10+ more source types
  2. Analyzes — deep AST parsing, pattern detection, API extraction
  3. Structures — categorized reference files with metadata
  4. Enhances — AI-powered SKILL.md generation (Claude, Gemini, or local)
  5. Exports — 16 platform-specific formats from one asset

Why Use This?

For AI Skill Builders (Claude, Gemini, OpenAI)

  • 🎯 Production-grade Skills — 500+ line SKILL.md files with code examples, patterns, and guides
  • 🔄 Enhancement Workflows — Apply security-focus, architecture-comprehensive, or custom YAML presets
  • 🎮 Any Domain — Game engines (Godot, Unity), frameworks (React, Django), internal tools
  • 🔧 Teams — Combine internal docs + code into a single source of truth
  • 📚 Quality — AI-enhanced with examples, quick reference, and navigation guidance

For RAG Builders & AI Engineers

  • 🤖 RAG-ready data — Pre-chunked LangChain Documents, LlamaIndex TextNodes, Haystack Documents
  • 🚀 99% faster — Days of preprocessing → 15–45 minutes
  • 📊 Smart metadata — Categories, sources, types → better retrieval accuracy
  • 🔄 Multi-source — Combine docs + GitHub + PDFs + videos in one pipeline
  • 🌐 Platform-agnostic — Export to any vector DB or framework without re-scraping

For AI Coding Assistant Users

  • 💻 Cursor / Windsurf / Cline — Generate .cursorrules / .windsurfrules / .clinerules automatically
  • 🎯 Persistent context — AI "knows" your frameworks without repeated prompting
  • 📚 Always current — Update context in minutes when docs change

Key Features

🌐 Documentation Scraping

  • Smart SPA Discovery - Three-layer discovery for JavaScript SPA sites (sitemap.xml → llms.txt → headless browser rendering)
  • llms.txt Support - Automatically detects and uses LLM-ready documentation files (10x faster)
  • Universal Scraper - Works with ANY documentation website
  • Smart Categorization - Automatically organizes content by topic
  • Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc.
  • 24+ Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more

📄 PDF Support

  • Basic PDF Extraction - Extract text, code, and images from PDF files
  • OCR for Scanned PDFs - Extract text from scanned documents
  • Password-Protected PDFs - Handle encrypted PDFs
  • Table Extraction - Extract complex tables from PDFs
  • Parallel Processing - 3x faster for large PDFs
  • Intelligent Caching - 50% faster on re-runs

🎬 Video Extraction

  • YouTube & Local Videos - Extract transcripts, on-screen code, and structured knowledge from videos
  • Visual Frame Analysis - OCR extraction from code editors, terminals, slides, and diagrams
  • GPU Auto-Detection - Automatically installs correct PyTorch build (CUDA/ROCm/MPS/CPU)
  • AI Enhancement - Two-pass: clean OCR artifacts + generate polished SKILL.md
  • Time Clipping - Extract specific sections with --start-time and --end-time
  • Playlist Support - Batch process all videos in a YouTube playlist
  • Vision API Fallback - Use Claude Vision for low-confidence OCR frames

🐙 GitHub Repository Analysis

  • Deep Code Analysis - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
  • API Extraction - Functions, classes, methods with parameters and types
  • Repository Metadata - README, file tree, language breakdown, stars/forks
  • GitHub Issues & PRs - Fetch open/closed issues with labels and milestones
  • CHANGELOG & Releases - Automatically extract version history
  • Conflict Detection - Compare documented APIs vs actual code implementation
  • MCP Integration - Natural language: "Scrape GitHub repo facebook/react"

🔄 Unified Multi-Source Scraping

  • Combine Multiple Sources - Mix documentation + GitHub + PDF in one skill
  • Conflict Detection - Automatically finds discrepancies between docs and code
  • Intelligent Merging - Rule-based or AI-powered conflict resolution
  • Transparent Reporting - Side-by-side comparison with ⚠️ warnings
  • Documentation Gap Analysis - Identifies outdated docs and undocumented features
  • Single Source of Truth - One skill showing both intent (docs) and reality (code)
  • Backward Compatible - Legacy single-source configs still work

🤖 Multi-LLM Platform Support

  • 12 LLM Platforms - Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, Generic Markdown, OpenCode, Kimi (Moonshot AI), DeepSeek AI, Qwen (Alibaba), OpenRouter, Together AI, Fireworks AI
  • Universal Scraping - Same documentation works for all platforms
  • Platform-Specific Packaging - Optimized formats for each LLM
  • One-Command Export - --target flag selects platform
  • Optional Dependencies - Install only what you need
  • 100% Backward Compatible - Existing Claude workflows unchanged
Platform Format Upload Enhancement API Key Custom Endpoint
Claude AI ZIP + YAML ✅ Auto ✅ Yes ANTHROPIC_API_KEY ANTHROPIC_BASE_URL
Google Gemini tar.gz ✅ Auto ✅ Yes GOOGLE_API_KEY -
OpenAI ChatGPT ZIP + Vector Store ✅ Auto ✅ Yes OPENAI_API_KEY -
MiniMax AI ZIP + Knowledge Files ✅ Auto ✅ Yes MINIMAX_API_KEY -
Generic Markdown ZIP ❌ Manual ❌ No - -
# Claude (default - no changes needed!)
skill-seekerspackageoutput/react/
skill-seekersuploadreact.zip

# Google Gemini
pipinstallskill-seekers[gemini]
skill-seekerspackageoutput/react/--targetgemini
skill-seekersuploadreact-gemini.tar.gz--targetgemini

# OpenAI ChatGPT
pipinstallskill-seekers[openai]
skill-seekerspackageoutput/react/--targetopenai
skill-seekersuploadreact-openai.zip--targetopenai

# MiniMax AI
pipinstallskill-seekers[minimax]
skill-seekerspackageoutput/react/--targetminimax
skill-seekersuploadreact-minimax.zip--targetminimax

# Generic Markdown (universal export)
skill-seekerspackageoutput/react/--targetmarkdown
# Use the markdown files directly in any LLM

Installation:

# Install with Gemini support
pipinstallskill-seekers[gemini]

# Install with OpenAI support
pipinstallskill-seekers[openai]

# Install with MiniMax support
pipinstallskill-seekers[minimax]

# Install with all LLM platforms
pipinstallskill-seekers[all-llms]

🔗 RAG Framework Integrations

Quick Export:

# LangChain Documents (JSON)
skill-seekerspackageoutput/django--targetlangchain
# → output/django-langchain.json

# LlamaIndex TextNodes (JSON)
skill-seekerspackageoutput/django--targetllama-index
# → output/django-llama-index.json

# Markdown (Universal)
skill-seekerspackageoutput/django--targetmarkdown
# → output/django-markdown/SKILL.md + references/

Complete RAG Pipeline Guide: RAG Pipelines Documentation


🧠 AI Coding Assistant Integrations

Transform any framework documentation into expert coding context for 4+ AI assistants:

  • Cursor IDE - Generate .cursorrules for AI-powered code suggestions

  • Windsurf - Customize Windsurf's AI assistant context with .windsurfrules

  • Cline (VS Code) - System prompts + MCP for VS Code agent

  • Continue.dev - Context servers for IDE-agnostic AI

Quick Export for AI Coding Tools:

# For any AI coding assistant (Cursor, Windsurf, Cline, Continue.dev)
skill-seekerscreate--configconfigs/django.json
skill-seekerspackageoutput/django--targetclaude# or --target markdown

# Copy to your project (example for Cursor)
cpoutput/django-claude/SKILL.mdmy-project/.cursorrules

# Or for Windsurf
cpoutput/django-claude/SKILL.mdmy-project/.windsurf/rules/django.md

# Or for Cline
cpoutput/django-claude/SKILL.mdmy-project/.clinerules

# Or for Continue.dev (HTTP server)
pythonexamples/continue-dev-universal/context_server.py
# Configure in ~/.continue/config.json

Integration Hub: All AI System Integrations


🌊 Three-Stream GitHub Architecture

  • Triple-Stream Analysis - Split GitHub repos into Code, Docs, and Insights streams
  • Unified Codebase Analyzer - Works with GitHub URLs AND local paths
  • C3.x as Analysis Depth - Choose 'basic' (1-2 min) or 'c3x' (20-60 min) analysis
  • Enhanced Router Generation - GitHub metadata, README quick start, common issues
  • Issue Integration - Top problems and solutions from GitHub issues
  • Smart Routing Keywords - GitHub labels weighted 2x for better topic detection

Three Streams Explained:

  • Stream 1: Code - Deep C3.x analysis (patterns, examples, guides, configs, architecture)
  • Stream 2: Docs - Repository documentation (README, CONTRIBUTING, docs/*.md)
  • Stream 3: Insights - Community knowledge (issues, labels, stars, forks)
fromskill_seekers.cli.unified_codebase_analyzerimport UnifiedCodebaseAnalyzer

# Analyze GitHub repo with all three streams
analyzer = UnifiedCodebaseAnalyzer()
result = analyzer.analyze(
 source="https://github.com/facebook/react",
 depth="c3x", # or "basic" for fast analysis
 fetch_github_metadata=True
)

# Access code stream (C3.x analysis)
print(f"Design patterns: {len(result.code_analysis['c3_1_patterns'])}")
print(f"Test examples: {result.code_analysis['c3_2_examples_count']}")

# Access docs stream (repository docs)
print(f"README: {result.github_docs['readme'][:100]}")

# Access insights stream (GitHub metadata)
print(f"Stars: {result.github_insights['metadata']['stars']}")
print(f"Common issues: {len(result.github_insights['common_problems'])}")

See complete documentation: Three-Stream Implementation Summary

🔐 Smart Rate Limit Management & Configuration

  • Multi-Token Configuration System - Manage multiple GitHub accounts (personal, work, OSS)
    • Secure config storage at ~/.config/skill-seekers/config.json (600 permissions)
    • Per-profile rate limit strategies: prompt, wait, switch, fail
    • Configurable timeout per profile (default: 30 min, prevents indefinite waits)
    • Smart fallback chain: CLI arg → Env var → Config file → Prompt
    • API key management for Claude, Gemini, OpenAI
  • Interactive Configuration Wizard - Beautiful terminal UI for easy setup
    • Browser integration for token creation (auto-opens GitHub, etc.)
    • Token validation and connection testing
    • Visual status display with color coding
  • Intelligent Rate Limit Handler - No more indefinite waits!
    • Upfront warning about rate limits (60/hour vs 5000/hour)
    • Real-time detection from GitHub API responses
    • Live countdown timers with progress
    • Automatic profile switching when rate limited
    • Four strategies: prompt (ask), wait (countdown), switch (try another), fail (abort)
  • Resume Capability - Continue interrupted jobs
    • Auto-save progress at configurable intervals (default: 60 sec)
    • List all resumable jobs with progress details
    • Auto-cleanup of old jobs (default: 7 days)
  • CI/CD Support - Non-interactive mode for automation
    • --non-interactive flag fails fast without prompts
    • --profile flag to select specific GitHub account
    • Clear error messages for pipeline logs

Quick Setup:

# One-time configuration (5 minutes)
skill-seekersconfig--github

# Use specific profile for private repos
skill-seekerscreatemycompany/private-repo--profilework

# CI/CD mode (fail fast, no prompts)
skill-seekerscreateowner/repo--non-interactive

# Resume interrupted job
skill-seekersresume--list
skill-seekersresumegithub_react_20260117_143022

Rate Limit Strategies Explained:

  • prompt (default) - Ask what to do when rate limited (wait, switch, setup token, cancel)
  • wait - Automatically wait with countdown timer (respects timeout)
  • switch - Automatically try next available profile (for multi-account setups)
  • fail - Fail immediately with clear error (perfect for CI/CD)

🎯 Bootstrap Skill - Self-Hosting

Generate skill-seekers as a skill to use within your AI agent (Claude Code, Kimi, Codex, etc.):

# Generate the skill
./scripts/bootstrap_skill.sh

# Install to Claude Code
cp-routput/skill-seekers~/.claude/skills/

What you get:

  • Complete skill documentation - All CLI commands and usage patterns
  • CLI command reference - Every tool and its options documented
  • Quick start examples - Common workflows and best practices
  • Auto-generated API docs - Code analysis, patterns, and examples

🔐 Private Config Repositories

  • Git-Based Config Sources - Fetch configs from private/team git repositories
  • Multi-Source Management - Register unlimited GitHub, GitLab, Bitbucket repos
  • Team Collaboration - Share custom configs across 3-5 person teams
  • Enterprise Support - Scale to 500+ developers with priority-based resolution
  • Secure Authentication - Environment variable tokens (GITHUB_TOKEN, GITLAB_TOKEN)
  • Intelligent Caching - Clone once, pull updates automatically
  • Offline Mode - Work with cached configs when offline

🤖 Codebase Analysis (C3.x)

C3.4: Configuration Pattern Extraction with AI Enhancement

  • 9 Config Formats - JSON, YAML, TOML, ENV, INI, Python, JavaScript, Dockerfile, Docker Compose
  • 7 Pattern Types - Database, API, logging, cache, email, auth, server configurations
  • AI Enhancement - Optional dual-mode AI analysis (API + LOCAL)
    • Explains what each config does
    • Suggests best practices and improvements
    • Security analysis - Finds hardcoded secrets, exposed credentials
  • Auto-Documentation - Generates JSON + Markdown documentation of all configs
  • MCP Integration - extract_config_patterns tool with enhancement support

C3.3: AI-Enhanced How-To Guides

  • Comprehensive AI Enhancement - Transforms basic guides into professional tutorials
  • 5 Automatic Improvements - Step descriptions, troubleshooting, prerequisites, next steps, use cases
  • Dual-Mode Support - API mode (Claude API) or LOCAL mode (Claude Code CLI)
  • No API Costs with LOCAL Mode - FREE enhancement using your Claude Code Max plan
  • Quality Transformation - 75-line templates → 500+ line comprehensive guides

Usage:

# Quick analysis (1-2 min, basic features only)
skill-seekersscantests/--quick

# Comprehensive analysis with AI (20-60 min, all features)
skill-seekersscantests/--comprehensive

# With AI enhancement
skill-seekersscantests/--enhance

Full Documentation: docs/features/HOW_TO_GUIDES.md

🔄 Enhancement Workflow Presets

Reusable YAML-defined enhancement pipelines that control how AI transforms your raw documentation into a polished skill.

  • 5 Bundled Presetsdefault, minimal, security-focus, architecture-comprehensive, api-documentation
  • User-Defined Presets — add custom workflows to ~/.config/skill-seekers/workflows/
  • Multiple Workflows — chain two or more workflows in one command
  • Fully Managed CLI — list, inspect, copy, add, remove, and validate workflows
# Apply a single workflow
skill-seekerscreate./my-project--enhance-workflowsecurity-focus

# Chain multiple workflows (applied in order)
skill-seekerscreate./my-project\
--enhance-workflowsecurity-focus\
--enhance-workflowminimal

# Manage presets
skill-seekersworkflowslist# List all (bundled + user)
skill-seekersworkflowsshowsecurity-focus# Print YAML content
skill-seekersworkflowscopysecurity-focus# Copy to user dir for editing
skill-seekersworkflowsadd./my-workflow.yaml# Install a custom preset
skill-seekersworkflowsremovemy-workflow# Remove a user preset
skill-seekersworkflowsvalidatesecurity-focus# Validate preset structure

# Copy multiple at once
skill-seekersworkflowscopysecurity-focusminimalapi-documentation

# Add multiple files at once
skill-seekersworkflowsadd./wf-a.yaml./wf-b.yaml

# Remove multiple at once
skill-seekersworkflowsremovemy-wf-amy-wf-b

YAML preset format:

name:security-focus
description:"Security-focusedreview:vulnerabilities,auth,datahandling"
version:"1.0"
stages:
-name:vulnerabilities
type:custom
prompt:"ReviewforOWASPtop10andcommonsecurityvulnerabilities..."
-name:auth-review
type:custom
prompt:"Examineauthenticationandauthorisationpatterns..."
uses_history:true

⚡ Performance & Scale

  • Async Mode - 2-3x faster scraping with async/await (use --async flag)
  • Large Documentation Support - Handle 10K-40K+ page docs with intelligent splitting
  • Router/Hub Skills - Intelligent routing to specialized sub-skills
  • Parallel Scraping - Process multiple skills simultaneously
  • Checkpoint/Resume - Never lose progress on long scrapes
  • Caching System - Scrape once, rebuild instantly

🤖 Agent-Agnostic Skill Generation

  • Multi-Agent Support - Generate skills for Claude, Kimi, Codex, Copilot, OpenCode, or any custom agent via --agent flag
  • Custom Agent Commands - Use --agent-cmd to specify a custom agent CLI command for enhancement
  • Universal Flags - --agent and --agent-cmd available on all commands (create, scrape, github, pdf, etc.)

📦 Marketplace Pipeline

  • Publish to Marketplace - Publish skills to Claude Code plugin marketplace repos
  • End-to-End Pipeline - From documentation source to published marketplace entry

✅ Quality Assurance

  • Fully Tested - 3,700+ tests with comprehensive coverage

📦 Installation

# Basic install (documentation scraping, GitHub analysis, PDF, packaging)
pipinstallskill-seekers

# With all LLM platform support
pipinstallskill-seekers[all-llms]

# With MCP server
pipinstallskill-seekers[mcp]

# Everything
pipinstallskill-seekers[all]

Need help choosing? Run the setup wizard:

skill-seekers-setup

Installation Options

Install Features
pip install skill-seekers Scraping, GitHub analysis, PDF, all platforms
pip install skill-seekers[gemini] + Google Gemini support
pip install skill-seekers[openai] + OpenAI ChatGPT support
pip install skill-seekers[all-llms] + All LLM platforms
pip install skill-seekers[mcp] + MCP server for Claude Code, Cursor, etc.
pip install skill-seekers[video] + YouTube/Vimeo transcript & metadata extraction
pip install skill-seekers[video-full] + Whisper transcription & visual frame extraction
pip install skill-seekers[jupyter] + Jupyter Notebook support
pip install skill-seekers[pptx] + PowerPoint support
pip install skill-seekers[confluence] + Confluence wiki support
pip install skill-seekers[notion] + Notion pages support
pip install skill-seekers[rss] + RSS/Atom feed support
pip install skill-seekers[chat] + Slack/Discord chat export support
pip install skill-seekers[asciidoc] + AsciiDoc document support
pip install skill-seekers[all] Everything enabled

Video visual deps (GPU-aware): After installing skill-seekers[video-full], run skill-seekers create --setup to auto-detect your GPU and install the correct PyTorch variant + easyocr. This is the recommended way to install visual extraction dependencies.


🚀 One-Command Install Workflow

The fastest way to go from config to uploaded skill - complete automation:

# Install React skill from official configs (auto-uploads to Claude)
skill-seekersinstall--configreact

# Install from local config file
skill-seekersinstall--configconfigs/custom.json

# Install without uploading (package only)
skill-seekersinstall--configdjango--no-upload

# Preview workflow without executing
skill-seekersinstall--configreact--dry-run

Time: 20-45 minutes total | Quality: Production-ready (9/10) | Cost: Free

Phases executed:

📥 PHASE 1: Fetch Config (if config name provided)
📖 PHASE 2: Scrape Documentation
✨ PHASE 3: AI Enhancement (MANDATORY - no skip option)
📦 PHASE 4: Package Skill
☁️ PHASE 5: Upload to Claude (optional, requires API key)

Requirements:

  • ANTHROPIC_API_KEY environment variable (for auto-upload)
  • Claude Code Max plan (for local AI enhancement), or use --agent to select a different AI agent

📊 Feature Matrix

Skill Seekers supports 12 LLM platforms, 8 RAG/vector targets, 18 source types, and full feature parity across all targets.

Platforms: Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, Generic Markdown, OpenCode, Kimi (Moonshot AI), DeepSeek AI, Qwen (Alibaba), OpenRouter, Together AI, Fireworks AI Source Types: Documentation websites, GitHub repos, PDFs, Word (.docx), EPUB, Video, Local codebases, Jupyter Notebooks, Local HTML, OpenAPI/Swagger, AsciiDoc, PowerPoint (.pptx), RSS/Atom feeds, Man pages, Confluence wikis, Notion pages, Slack/Discord chat exports

See Complete Feature Matrix for detailed platform and feature support.

Quick Platform Comparison

Feature Claude Gemini OpenAI MiniMax Markdown
Format ZIP + YAML tar.gz ZIP + Vector ZIP + Knowledge ZIP
Upload ✅ API ✅ API ✅ API ✅ API ❌ Manual
Enhancement ✅ Sonnet 4 ✅ 2.0 Flash ✅ GPT-4o ✅ M3 ❌ None
All Skill Modes

Usage Examples

Documentation Scraping

# Scrape documentation website
skill-seekerscreate--configconfigs/react.json

# Quick scrape without config
skill-seekerscreatehttps://react.dev--namereact

# With async mode (3x faster)
skill-seekerscreate--configconfigs/godot.json--async--workers8

# Use a specific AI agent for enhancement
skill-seekerscreate--configconfigs/react.json--agentkimi

PDF Extraction

# Basic PDF extraction
skill-seekerscreate--pdfdocs/manual.pdf--namemyskill

# Advanced features
skill-seekerscreate--pdfdocs/manual.pdf--namemyskill\
--extract-tables\ # Extract tables
--parallel\ # Fast parallel processing
--workers8# Use 8 CPU cores

# Scanned PDFs (requires: pip install pytesseract Pillow)
skill-seekerscreate--pdfdocs/scanned.pdf--namemyskill--ocr

Video Extraction

# Install video support
pipinstallskill-seekers[video]# Transcripts + metadata
pipinstallskill-seekers[video-full]# + Whisper + visual frame extraction

# Auto-detect GPU and install visual deps (PyTorch + easyocr)
skill-seekerscreate--setup

# Extract from YouTube video
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=dQw4w9WgXcQ--namemytutorial

# Extract from a YouTube playlist
skill-seekerscreate--video-playlisthttps://www.youtube.com/playlist?list=...--namemyplaylist

# Extract from a local video file
skill-seekerscreate--video-filerecording.mp4--namemyrecording

# Extract with visual frame analysis (requires video-full deps)
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=...--namemytutorial--visual

# With AI enhancement (cleans OCR + generates polished SKILL.md)
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=...--visual--enhance-level2

# Clip a specific section of a video (supports seconds, MM:SS, HH:MM:SS)
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=...--start-time1:30--end-time5:00

# Use Vision API for low-confidence OCR frames (requires ANTHROPIC_API_KEY)
skill-seekerscreate--video-urlhttps://www.youtube.com/watch?v=...--visual--vision-ocr

# Re-build skill from previously extracted data (skip download)
skill-seekerscreate--from-jsonoutput/mytutorial/video_data/extracted_data.json--namemytutorial

Full guide: See docs/VIDEO_GUIDE.md for complete CLI reference, visual pipeline details, AI enhancement options, and troubleshooting.

GitHub Repository Analysis

# Basic repository scraping
skill-seekerscreatefacebook/react

# With authentication (higher rate limits)
exportGITHUB_TOKEN=ghp_your_token_here
skill-seekerscreatefacebook/react

# Customize what to include
skill-seekerscreatedjango/django\
--include-issues\ # Extract GitHub Issues
--max-issues100\ # Limit issue count
--include-changelog# Extract CHANGELOG.md

Unified Multi-Source Scraping

Combine documentation + GitHub + PDF into one unified skill with conflict detection:

# Use existing unified configs
skill-seekerscreate--configconfigs/react_unified.json
skill-seekerscreate--configconfigs/django_unified.json

# Or create unified config
cat>configs/myframework_unified.json<< 'EOF'
{
 "name": "myframework",
 "merge_mode": "rule-based",
 "sources": [
 {
 "type": "documentation",
 "base_url": "https://docs.myframework.com/",
 "max_pages": 200
 },
 {
 "type": "github",
 "repo": "owner/myframework",
 "code_analysis_depth": "surface"
 }
 ]
}
EOF

skill-seekerscreate--configconfigs/myframework_unified.json

Conflict Detection automatically finds:

  • 🔴 Missing in code (high): Documented but not implemented
  • 🟡 Missing in docs (medium): Implemented but not documented
  • ⚠️ Signature mismatch: Different parameters/types
  • ℹ️ Description mismatch: Different explanations

Full Guide: See docs/features/UNIFIED_SCRAPING.md for complete documentation.

Private Config Repositories

Share custom configs across teams using private git repositories:

# Option 1: Using MCP tools (recommended)
# Register your team's private repo
add_config_source(
name="team",
git_url="https://github.com/mycompany/skill-configs.git",
token_env="GITHUB_TOKEN"
)

# Fetch config from team repo
fetch_config(source="team",config_name="internal-api")

Supported Platforms:

  • GitHub (GITHUB_TOKEN), GitLab (GITLAB_TOKEN), Gitea (GITEA_TOKEN), Bitbucket (BITBUCKET_TOKEN)

Full Guide: See docs/reference/GIT_CONFIG_SOURCES.md for complete documentation.

How It Works

graph LR
 A[Documentation Website] --> B[Skill Seekers]
 B --> C[Scraper]
 B --> D[AI Enhancement]
 B --> E[Packager]
 C --> F[Organized References]
 D --> F
 F --> E
 E --> G[AI Skill .zip]
 G --> H[Upload to AI Platform]
  1. Detect llms.txt - Checks for llms-full.txt, llms.txt, llms-small.txt first (part of Smart SPA Discovery)
  2. Scrape: Extracts all pages from documentation
  3. Categorize: Organizes content into topics (API, guides, tutorials, etc.)
  4. Enhance: AI analyzes docs and creates comprehensive SKILL.md with examples (supports multiple agents via --agent)
  5. Package: Bundles everything into a platform-ready .zip file

Architecture

The system is organized into 8 core modules and 5 utility modules (~200 classes total):

👁 Package Overview

Module Purpose Key Classes
CLICore Git-style command dispatcher CLIDispatcher, SourceDetector, CreateCommand
Scrapers 18 source-type extractors DocToSkillConverter, DocumentSkillBuilder (shared build layer), UnifiedScraper
Adaptors 20+ output platform formats SkillAdaptor (ABC), ClaudeAdaptor, LangChainAdaptor
Analysis C3.x codebase analysis pipeline UnifiedCodebaseAnalyzer, PatternRecognizer, 10 GoF detectors
Enhancement AI-powered skill improvement via AgentClient AgentClient, AIEnhancer, UnifiedEnhancer, WorkflowEngine
Packaging Package, upload, install skills PackageSkill, InstallAgent
MCP FastMCP server (40 tools) SkillSeekerMCPServer, 10 tool modules
Sync Doc change detection ChangeDetector, SyncMonitor, Notifier

Utility modules: Parsers (28 CLI parsers), Storage (S3/GCS/Azure), Embedding (multi-provider vectors), Benchmark (performance), Utilities (16 shared helpers).

Full UML diagrams: docs/UML_ARCHITECTURE.md | StarUML project: docs/UML/skill_seekers.mdj | HTML API reference: docs/UML/html/

📋 Prerequisites

Before you start, make sure you have:

  1. Python 3.10 or higher - Download | Check: python3 --version
  2. Git - Download | Check: git --version
  3. 15-30 minutes for first-time setup

First time user?Start Here: Bulletproof Quick Start Guide 🎯


📤 Uploading Skills to Claude

Once your skill is packaged, you need to upload it to Claude:

Option 1: Automatic Upload (API-based)

# Set your API key (one-time)
exportANTHROPIC_API_KEY=sk-ant-...

# Package and upload automatically
skill-seekerspackageoutput/react/--upload

# OR upload existing .zip
skill-seekersuploadoutput/react.zip

Option 2: Manual Upload (No API Key)

# Package skill
skill-seekerspackageoutput/react/
# → Creates output/react.zip

# Then manually upload:
# - Go to https://claude.ai/skills
# - Click "Upload Skill"
# - Select output/react.zip

Option 3: MCP (Claude Code)

In Claude Code, just ask:
"Package and upload the React skill"

🤖 Installing to AI Agents

Skill Seekers can automatically install skills to 19 AI coding agents.

# Install to specific agent
skill-seekersinstall-agentoutput/react/--agentcursor

# Install to IBM Bob (project-local .bob/skills/)
skill-seekersinstall-agentoutput/react/--agentbob

# Install to all agents at once
skill-seekersinstall-agentoutput/react/--agentall

# Preview without installing
skill-seekersinstall-agentoutput/react/--agentcursor--dry-run

Supported Agents

Agent Path Type
Claude Code ~/.claude/skills/ Global
Cursor .cursor/skills/ Project
VS Code / Copilot .github/skills/ Project
Amp ~/.amp/skills/ Global
Goose ~/.config/goose/skills/ Global
OpenCode ~/.opencode/skills/ Global
Windsurf ~/.windsurf/skills/ Global
Roo Code .roo/skills/ Project
Cline .cline/skills/ Project
Aider ~/.aider/skills/ Global
Bolt .bolt/skills/ Project
Kilo Code .kilo/skills/ Project
Continue ~/.continue/skills/ Global
Kimi Code ~/.kimi/skills/ Global
IBM Bob .bob/skills/ Project

🔌 MCP Integration (40 Tools)

Skill Seekers ships an MCP server for use from Claude Code, Cursor, Windsurf, VS Code + Cline, or IntelliJ IDEA.

# stdio mode (Claude Code, VS Code + Cline)
python-mskill_seekers.mcp.server_fastmcp

# HTTP mode (Cursor, Windsurf, IntelliJ)
python-mskill_seekers.mcp.server_fastmcp--transporthttp--port8765

# Auto-configure all agents at once
./setup_mcp.sh

All 40 tools available:

  • Core (9): list_configs, generate_config, validate_config, estimate_pages, scrape_docs, package_skill, upload_skill, enhance_skill, install_skill
  • Extended (10): scrape_github, scrape_pdf, unified_scrape, merge_sources, detect_conflicts, add_config_source, fetch_config, list_config_sources, remove_config_source, split_config
  • Vector DB (4): export_to_chroma, export_to_weaviate, export_to_faiss, export_to_qdrant
  • Cloud (3): cloud_upload, cloud_download, cloud_list

Full Guide: docs/guides/MCP_SETUP.md


⚙️ Configuration

Available Presets (24+)

# List all presets
# skill-seekers list-configs # Not available in v3.7.0
Category Presets
Web Frameworks react, vue, angular, svelte, nextjs
Python django, flask, fastapi, sqlalchemy, pytest
Game Development godot, pygame, unity
Tools & DevOps docker, kubernetes, terraform, ansible
Unified (Docs + GitHub) react-unified, vue-unified, nextjs-unified, and more

Creating Your Own Config

# Option 1: Interactive
skill-seekerscreate--interactive

# Option 2: Copy and edit a preset
cpconfigs/react.jsonconfigs/myframework.json
nanoconfigs/myframework.json
skill-seekerscreate--configconfigs/myframework.json

Config File Structure

{
"name":"myframework",
"description":"When to use this skill",
"base_url":"https://docs.myframework.com/",
"selectors":{
"main_content":"article",
"title":"h1",
"code_blocks":"pre code"
},
"url_patterns":{
"include":["/docs","/guide"],
"exclude":["/blog","/about"]
},
"categories":{
"getting_started":["intro","quickstart"],
"api":["api","reference"]
},
"rate_limit":0.5,
"max_pages":500
}

Where to Store Configs

The tool searches in this order:

  1. Exact path as provided
  2. ./configs/ (current directory)
  3. ~/.config/skill-seekers/configs/ (user config directory)
  4. SkillSeekersWeb.com API (preset configs)

📊 What Gets Created

output/
├── godot_data/ # Scraped raw data
│ ├── pages/ # JSON files (one per page)
│ └── summary.json # Overview
│
└── godot/ # The skill
 ├── SKILL.md # Enhanced with real examples
 ├── references/ # Categorized docs
 │ ├── index.md
 │ ├── getting_started.md
 │ ├── scripting.md
 │ └── ...
 ├── scripts/ # Empty (add your own)
 └── assets/ # Empty (add your own)

🐛 Troubleshooting

No Content Extracted?

  • Check your main_content selector
  • Try: article, main, div[role="main"]

Data Exists But Won't Use It?

# Force re-scrape
rm-rfoutput/myframework_data/
skill-seekerscreate--configconfigs/myframework.json

Categories Not Good?

Edit the config categories section with better keywords.

Want to Update Docs?

# Delete old data and re-scrape
rm-rfoutput/godot_data/
skill-seekerscreate--configconfigs/godot.json

Enhancement Not Working?

# Check if API key is set
echo$ANTHROPIC_API_KEY

# Try LOCAL mode instead (uses Claude Code Max, no API key needed)
skill-seekersenhanceoutput/react/--modeLOCAL

# Monitor background enhancement status
skill-seekersenhance-statusoutput/react/--watch

GitHub Rate Limit Issues?

# Set a GitHub token (5000 req/hour vs 60/hour anonymous)
exportGITHUB_TOKEN=ghp_your_token_here

# Or configure multiple profiles
skill-seekersconfig--github

📈 Performance

Task Time Notes
Scraping (sync) 15-45 min First time only, thread-based
Scraping (async) 5-15 min 2-3x faster with --async flag
Building 1-3 min Fast rebuild from cache
Re-building <1 min With --skip-scrape
Enhancement (LOCAL) 30-60 sec Uses Claude Code Max
Enhancement (API) 20-40 sec Requires API key
Video (transcript) 1-3 min YouTube/local, transcript only
Video (visual) 5-15 min + OCR frame extraction
Packaging 5-10 sec Final .zip creation

🆕 New in v3.6.0

Workflow Presets

Control analysis depth with --preset:

skill-seekerscreatehttps://docs.react.dev/--presetquick# Fast, surface-level
skill-seekerscreatehttps://docs.react.dev/--presetstandard# Balanced (default)
skill-seekerscreatehttps://docs.react.dev/--presetcomprehensive# Deep, exhaustive

Lifecycle Flags

skill-seekerscreatehttps://docs.react.dev/--dry-run# Preview without scraping
skill-seekerscreatehttps://docs.react.dev/--fresh# Ignore cache, full re-scrape
skill-seekerscreatehttps://docs.react.dev/--resume# Resume interrupted job
skill-seekerscreatehttps://docs.react.dev/--skip-scrape# Re-package existing output

Health Check & Utilities

skill-seekersdoctor# Diagnose installation & environment
skill-seekerssync-config# Detect config drift
skill-seekersstream<source># Streaming ingestion for large docs
skill-seekersupdateoutput/react/# Incremental update
skill-seekersmultilang<source># Multi-language skill generation
skill-seekersqualityoutput/react/# Quality report (add --threshold 7 to gate: non-zero exit below 7/10)

RAG Chunking Options (package)

skill-seekerspackageoutput/react/--chunk-for-rag--chunk-tokens512--chunk-overlap-tokens50

Marketplace Publishing

skill-seekerspackageoutput/react/--marketplace--marketplace-categoryfrontend

Additional Optional Dependencies

Extra Install Purpose
browser pip install "skill-seekers[browser]" Headless Playwright for SPA sites
embedding pip install "skill-seekers[embedding]" Embedding server support
s3 / gcs / azure pip install "skill-seekers[s3]" etc. Cloud storage upload
rag-upload pip install "skill-seekers[rag-upload]" Combined vector DB upload deps

📚 Documentation

Getting Started

Architecture

Guides

Integration Guides


📝 License

MIT License - see LICENSE file for details


Happy skill building! 🚀


🔒 Security

👁 MseeP.ai Security Assessment Badge


💛 Sponsors

👁 Atlas Cloud

Atlas Cloud — a full-modal, OpenAI-compatible AI inference platform. Skill Seekers supports it as a packaging/enhancement target via --target atlas with ATLAS_API_KEY.

Project details

Verified details

These details have been verified by PyPI
Maintainers
👁 Avatar for yusyus from gravatar.com
yusyus

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: MIT License (MIT)
  • Author: Yusuf Karaaslan
  • Tags claude , ai , documentation , scraping , skills , llm , mcp , automation , i18n , chinese , international
  • Requires: Python >=3.10
  • Provides-Extra: mcp , gemini , openai , minimax , kimi , deepseek , qwen , openrouter , together , fireworks , all-llms , s3 , gcs , azure , docx , epub , video , video-full , chroma , weaviate , sentence-transformers , pinecone , rag-upload , all-cloud , jupyter , asciidoc , pptx , confluence , notion , rss , chat , browser , embedding , all

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skill_seekers-3.8.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

Filter files by name, interpreter, ABI, and platform.

If you're not sure about the file name format, learn more about wheel file names.

Copy a direct link to the current filters

skill_seekers-3.8.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file skill_seekers-3.8.0.tar.gz.

File metadata

  • Download URL: skill_seekers-3.8.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for skill_seekers-3.8.0.tar.gz
Algorithm Hash digest
SHA256 7cb55c691da77f6ab3f94859c050169766d6276cd7f49cc0b732d8087a5fb00e
MD5 cd556ee86ec5a1f90c549d06c5a3526f
BLAKE2b-256 43ed49b7eaf288c8aadbcdba186dae8d9a9bb7bb4f2304eec8df40b3c069fe00

See more details on using hashes here.

File details

Details for the file skill_seekers-3.8.0-py3-none-any.whl.

File metadata

  • Download URL: skill_seekers-3.8.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for skill_seekers-3.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bbbbb3a2eb47e9665835d882d5cdda656aefc30fc62b42e183e898f904de760a
MD5 277474cd61da2458715e3dc91d5207a9
BLAKE2b-256 95f73e873484bc9509ae343bd96ee475fbd70664347c52d8ed65ce40c03855bf

See more details on using hashes here.

Supported by

👁 Image
AWS Cloud computing and Security Sponsor 👁 Image
Datadog Monitoring 👁 Image
Depot Continuous Integration 👁 Image
Fastly CDN 👁 Image
Google Download Analytics 👁 Image
Pingdom Monitoring 👁 Image
Sentry Error logging 👁 Image
StatusPage Status page