spider-cli-extraction
2
总安装量
2
周安装量
#74955
全站排名
安装命令
npx skills add https://github.com/spider-rs/spider_skills --skill spider-cli-extraction
Agent 安装分布
trae
2
antigravity
2
claude-code
2
junie
2
github-copilot
2
codex
2
Skill 文档
Spider CLI Extraction
Overview
Use this skill to run Spider CLI workflows with explicit runtime mode control.
Canonical source for cross-agent behavior: skills/core/spider-cli-extraction.md
Load references/cli-workflows.md when you need exact command patterns or mode-selection rules.
Workflow
- Confirm CLI availability.
- Prefer
cargo run -p spider_cli -- ...from the Spider repo root. - If
spideris globally installed, usespider ...for quick checks.
- Choose the task mode.
- Use
crawlto collect links. - Use
scrapeto emit per-page JSON records and optionally include HTML. - Use
downloadto persist page markup to disk.
- Select runtime execution mode.
- Use
--headlessfor browser-rendered mode. - Use
--httpto force HTTP-only mode. - Omit both for default HTTP behavior.
- Add scope controls.
- Set
--limit,--depth,--budget, and--blacklist-url. - Add
--respect-robots-txtwhen policy compliance is required.
Quick Commands
# Crawl links (default HTTP mode)
cargo run -p spider_cli -- --url https://example.com crawl --output-links
# Browser mode on demand
cargo run -p spider_cli -- --url https://example.com --headless crawl --output-links
# Scrape with HTML output
cargo run -p spider_cli -- --url https://example.com scrape --output-html
Script
Use scripts/spider_cli_helper.sh for wrappers:
./scripts/spider_cli_helper.sh verify-headless
./scripts/spider_cli_helper.sh crawl https://example.com --limit 20 --depth 2
./scripts/spider_cli_helper.sh scrape https://example.com --output-html --output-links