article-extractor

📁 jrajasekera/claude-skills 📅 9 days ago

总安装量

周安装量

#51661

全站排名

安装命令

npx skills add https://github.com/jrajasekera/claude-skills --skill article-extractor

Agent 安装分布

openclaw 2

opencode 2

cursor 2

claude-code 2

Skill 文档

Article Extractor

Extract clean article content from URLs, removing ads, navigation, and clutter. Multi-tool fallback ensures reliability.

Workflow

When user provides a URL to download/extract:

Call the extraction script directly with the URL (do NOT fetch the URL first with web_fetch)
Script handles fetching, extraction, and saving automatically
Returns clean markdown file with frontmatter

Usage

# Basic extraction
scripts/extract-article.sh "https://example.com/article"

# Specify output location
scripts/extract-article.sh "https://example.com/article" -o my-article.md -d ~/Documents

# Try Wayback Machine if original fails
scripts/extract-article.sh "https://example.com/article" --wayback

Make script executable if needed: chmod +x scripts/extract-article.sh

Key Options

-o <file> – Output filename
-d <dir> – Output directory
-w, --wayback – Try Wayback Machine if extraction fails
-t <tool> – Force tool: jina, trafilatura, readability, fallback
-q – Quiet mode

For complete options, exit codes, tool details, and examples, see references/tools-and-options.md.

Common Failures

Exit 3 (access denied): Paywall or login required – try --wayback
Exit 4 (no content): Heavy JavaScript – try different --tool
Exit 2 (network): Connection issue – check URL

Local Tools (Optional)

For offline extraction: scripts/install-deps.sh

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台