firecrawl
npx skills add https://github.com/developersdigest/test-agent-skill-fc-1 --skill firecrawl
Agent 安装分布
Skill 文档
Firecrawl CLI
Use the firecrawl CLI for web scraping, search, and browser automation. It returns clean markdown optimized for LLM context windows and handles JavaScript rendering.
Prerequisites
The firecrawl-cli package must already be installed and authenticated. Check status:
firecrawl --status
Expected output when ready:
ð¥ firecrawl cli v1.4.1
â Authenticated via FIRECRAWL_API_KEY
Concurrency: 0/100 jobs (parallel scrape limit)
Credits: 500,000 remaining
- Concurrency: Max parallel jobs. Stay within this limit.
- Credits: Remaining API credits. Each operation consumes credits.
If not installed or not authenticated, refer to rules/install.md.
Organization
Create a .firecrawl/ folder in the working directory to store results. Add .firecrawl/ to .gitignore. Use -o to write output to files (avoids flooding context):
firecrawl search "your query" -o .firecrawl/search-{query}.json
firecrawl scrape https://example.com -o .firecrawl/{site}-{path}.md
Always quote URLs in shell commands.
Commands
Search
# Basic search
firecrawl search "your query" -o .firecrawl/search-query.json --json
# Limit results
firecrawl search "AI news" --limit 10 -o .firecrawl/search-ai-news.json --json
# Search specific sources
firecrawl search "tech startups" --sources news -o .firecrawl/search-news.json --json
firecrawl search "landscapes" --sources images -o .firecrawl/search-images.json --json
# Time-based search
firecrawl search "AI announcements" --tbs qdr:d -o .firecrawl/search-today.json --json # Past day
firecrawl search "tech news" --tbs qdr:w -o .firecrawl/search-week.json --json # Past week
# Search AND scrape content from results
firecrawl search "API docs" --scrape -o .firecrawl/search-docs.json --json
Options: --limit <n>, --sources <web,images,news>, --categories <github,research,pdf>, --tbs <qdr:h|d|w|m|y>, --location, --country <code>, --scrape, -o <path>
Scrape
# Markdown output
firecrawl scrape https://example.com -o .firecrawl/example.md
# Main content only (removes nav, footer, ads)
firecrawl scrape https://example.com --only-main-content -o .firecrawl/example.md
# Multiple formats (JSON output)
firecrawl scrape https://example.com --format markdown,links -o .firecrawl/example.json
# Wait for JS to render
firecrawl scrape https://spa-app.com --wait-for 3000 -o .firecrawl/spa.md
# Include/exclude specific HTML tags
firecrawl scrape https://example.com --include-tags article,main -o .firecrawl/article.md
Options: -f <markdown,html,rawHtml,links,screenshot,json>, --only-main-content, --wait-for <ms>, --include-tags, --exclude-tags, -o <path>
Map
Discover all URLs on a site:
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt
firecrawl map https://example.com --limit 500 --json -o .firecrawl/urls.json
Options: --limit <n>, --search <query>, --include-subdomains, --json, -o <path>
Browser
Launch cloud Chromium sessions for interactive browsing. Sessions auto-launch on first use:
firecrawl browser "open https://example.com"
firecrawl browser "snapshot"
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'search query'"
firecrawl browser "scrape" -o .firecrawl/browser-scrape.md
firecrawl browser close
Core commands: open <url>, snapshot, screenshot, click <@ref>, type <@ref> <text>, fill <@ref> <text>, scrape, scroll <direction>, wait <seconds>
Session management:
firecrawl browser launch-session --ttl 600
firecrawl browser list
firecrawl browser close
Options: --ttl <seconds>, --ttl-inactivity <seconds>, --stream, --session <id>, -o <path>
Crawl
# Start and wait for completion
firecrawl crawl https://example.com --wait -o .firecrawl/crawl-result.json
# Limit scope
firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait -o .firecrawl/crawl-result.json
# Crawl specific sections
firecrawl crawl https://example.com --include-paths /blog,/docs --wait -o .firecrawl/crawl-result.json
Options: --wait, --progress, --limit <n>, --max-depth <n>, --include-paths, --exclude-paths, --delay <ms>, --max-concurrency <n>, -o <path>
Agent
AI-powered autonomous web data extraction (takes 2-5 minutes):
firecrawl agent "Find the pricing plans for Firecrawl" --wait -o .firecrawl/agent-pricing.json
firecrawl agent "Extract product data" --urls https://example.com --wait -o .firecrawl/agent-products.json
firecrawl agent "Extract company info" --schema '{"type":"object","properties":{"name":{"type":"string"}}}' --wait -o .firecrawl/agent-info.json
Options: --urls <urls>, --model <spark-1-mini|spark-1-pro>, --schema <json>, --schema-file <path>, --max-credits <n>, --wait, -o <path>
Reading Output Files
Firecrawl output files can be large. Use incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Parallelization
Run multiple scrapes in parallel when possible:
firecrawl scrape https://site1.com -o .firecrawl/1.md &
firecrawl scrape https://site2.com -o .firecrawl/2.md &
firecrawl scrape https://site3.com -o .firecrawl/3.md &
wait
Check firecrawl --status for your concurrency limit before parallelizing.