firecrawl

📁 developersdigest/test-agent-skill-fc-1 📅 10 days ago

总安装量

周安装量

#64937

全站排名

安装命令

npx skills add https://github.com/developersdigest/test-agent-skill-fc-1 --skill firecrawl

Agent 安装分布

amp 2

gemini-cli 2

claude-code 2

github-copilot 2

codex 2

kimi-cli 2

Skill 文档

Firecrawl CLI

Use the firecrawl CLI for web scraping, search, and browser automation. It returns clean markdown optimized for LLM context windows and handles JavaScript rendering.

Prerequisites

The firecrawl-cli package must already be installed and authenticated. Check status:

firecrawl --status

Expected output when ready:

  ð¥ firecrawl cli v1.4.1

  â Authenticated via FIRECRAWL_API_KEY
  Concurrency: 0/100 jobs (parallel scrape limit)
  Credits: 500,000 remaining

Concurrency: Max parallel jobs. Stay within this limit.
Credits: Remaining API credits. Each operation consumes credits.

If not installed or not authenticated, refer to rules/install.md.

Organization

Create a .firecrawl/ folder in the working directory to store results. Add .firecrawl/ to .gitignore. Use -o to write output to files (avoids flooding context):

firecrawl search "your query" -o .firecrawl/search-{query}.json
firecrawl scrape https://example.com -o .firecrawl/{site}-{path}.md

Always quote URLs in shell commands.

Commands

Search

# Basic search
firecrawl search "your query" -o .firecrawl/search-query.json --json

# Limit results
firecrawl search "AI news" --limit 10 -o .firecrawl/search-ai-news.json --json

# Search specific sources
firecrawl search "tech startups" --sources news -o .firecrawl/search-news.json --json
firecrawl search "landscapes" --sources images -o .firecrawl/search-images.json --json

# Time-based search
firecrawl search "AI announcements" --tbs qdr:d -o .firecrawl/search-today.json --json  # Past day
firecrawl search "tech news" --tbs qdr:w -o .firecrawl/search-week.json --json          # Past week

# Search AND scrape content from results
firecrawl search "API docs" --scrape -o .firecrawl/search-docs.json --json

Options: --limit <n>, --sources <web,images,news>, --categories <github,research,pdf>, --tbs <qdr:h|d|w|m|y>, --location, --country <code>, --scrape, -o <path>

Scrape

# Markdown output
firecrawl scrape https://example.com -o .firecrawl/example.md

# Main content only (removes nav, footer, ads)
firecrawl scrape https://example.com --only-main-content -o .firecrawl/example.md

# Multiple formats (JSON output)
firecrawl scrape https://example.com --format markdown,links -o .firecrawl/example.json

# Wait for JS to render
firecrawl scrape https://spa-app.com --wait-for 3000 -o .firecrawl/spa.md

# Include/exclude specific HTML tags
firecrawl scrape https://example.com --include-tags article,main -o .firecrawl/article.md

Options: -f <markdown,html,rawHtml,links,screenshot,json>, --only-main-content, --wait-for <ms>, --include-tags, --exclude-tags, -o <path>

Map

Discover all URLs on a site:

firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt
firecrawl map https://example.com --limit 500 --json -o .firecrawl/urls.json

Options: --limit <n>, --search <query>, --include-subdomains, --json, -o <path>

Browser

Launch cloud Chromium sessions for interactive browsing. Sessions auto-launch on first use:

firecrawl browser "open https://example.com"
firecrawl browser "snapshot"
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'search query'"
firecrawl browser "scrape" -o .firecrawl/browser-scrape.md
firecrawl browser close

Core commands: open <url>, snapshot, screenshot, click <@ref>, type <@ref> <text>, fill <@ref> <text>, scrape, scroll <direction>, wait <seconds>

Session management:

firecrawl browser launch-session --ttl 600
firecrawl browser list
firecrawl browser close

Options: --ttl <seconds>, --ttl-inactivity <seconds>, --stream, --session <id>, -o <path>

Crawl

# Start and wait for completion
firecrawl crawl https://example.com --wait -o .firecrawl/crawl-result.json

# Limit scope
firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait -o .firecrawl/crawl-result.json

# Crawl specific sections
firecrawl crawl https://example.com --include-paths /blog,/docs --wait -o .firecrawl/crawl-result.json

Options: --wait, --progress, --limit <n>, --max-depth <n>, --include-paths, --exclude-paths, --delay <ms>, --max-concurrency <n>, -o <path>

Agent

AI-powered autonomous web data extraction (takes 2-5 minutes):

firecrawl agent "Find the pricing plans for Firecrawl" --wait -o .firecrawl/agent-pricing.json
firecrawl agent "Extract product data" --urls https://example.com --wait -o .firecrawl/agent-products.json
firecrawl agent "Extract company info" --schema '{"type":"object","properties":{"name":{"type":"string"}}}' --wait -o .firecrawl/agent-info.json

Options: --urls <urls>, --model <spark-1-mini|spark-1-pro>, --schema <json>, --schema-file <path>, --max-credits <n>, --wait, -o <path>

Reading Output Files

Firecrawl output files can be large. Use incremental reads:

wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md

Parallelization

Run multiple scrapes in parallel when possible:

firecrawl scrape https://site1.com -o .firecrawl/1.md &
firecrawl scrape https://site2.com -o .firecrawl/2.md &
firecrawl scrape https://site3.com -o .firecrawl/3.md &
wait

Check firecrawl --status for your concurrency limit before parallelizing.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台