brightdata
31
总安装量
12
周安装量
#11943
全站排名
安装命令
npx skills add https://github.com/danielmiessler/personal_ai_infrastructure --skill brightdata
Agent 安装分布
claude-code
8
gemini-cli
8
codex
6
github-copilot
4
antigravity
3
Skill 文档
BrightData – Progressive URL Scraping
Intelligent URL content retrieval with automatic fallback strategy.
Activation Triggers
Direct Scraping Requests
- “scrape this URL”, “scrape [URL]”, “scrape this page”
- “fetch this URL”, “fetch [URL]”, “fetch this page”
- “pull content from [URL]”, “get content from [URL]”
- “retrieve [URL]”, “download this page content”
Access Issues
- “can’t access this site”, “site is blocking me”
- “bot detection”, “CAPTCHA”, “403 error”
- “this URL won’t load”
Explicit Tier Requests
- “use Bright Data to fetch [URL]” – Skip to Tier 4
- “use browser to scrape [URL]” – Skip to Tier 3
Core Capability
Four-Tier Progressive Escalation:
START
|
Tier 1 (WebFetch) --> Success? --> Return content
|
No
|
Tier 2 (Curl) --> Success? --> Return content
|
No
|
Tier 3 (Browser) --> Success? --> Return content
|
No
|
Tier 4 (Bright Data) --> Success? --> Return content
|
No
|
Report failure + alternatives
Tier Details
Tier 1: WebFetch (Built-in)
- Tool: Claude Code’s WebFetch
- Speed: ~2-5 seconds
- Cost: Free
- Works for: Public sites, no bot detection
Tier 2: Curl with Chrome Headers
- Tool: Bash curl with comprehensive browser headers
- Speed: ~3-7 seconds
- Cost: Free
- Works for: Sites with basic user-agent filtering
Tier 3: Browser Automation (Playwright)
- Tool: Browser skill’s Playwright automation
- Speed: ~10-20 seconds
- Cost: Free
- Works for: JavaScript SPAs, dynamic content
Tier 4: Bright Data MCP
- Tool:
mcp__Brightdata__scrape_as_markdown - Speed: ~5-15 seconds
- Cost: Bright Data credits
- Works for: CAPTCHA, advanced bot detection, residential IPs
Workflow Routing
Default workflow for all URL scraping:
- Route to:
Workflows/FourTierScrape.md - Output: URL content in markdown format
Examples
Example 1: Simple Site (Tier 1 Success)
User: “Scrape https://example.com“
Process:
- Attempt Tier 1 (WebFetch)
- Success in 3 seconds
- Return markdown content
Example 2: JavaScript Site (Tier 3 Success)
User: “Fetch https://spa-app.com“
Process:
- Tier 1 fails (blocked)
- Tier 2 fails (JavaScript required)
- Tier 3 succeeds with Playwright
- Return markdown content
Example 3: Protected Site (Tier 4 Success)
User: “Can’t access https://protected-site.com“
Process:
- Tier 1 fails (403)
- Tier 2 fails (bot detection)
- Tier 3 fails (CAPTCHA)
- Tier 4 succeeds with Bright Data
- Return markdown content
Example 4: Direct Tier Request
User: “Use Bright Data to fetch https://any-site.com“
Process:
- User explicitly requested Bright Data
- Skip directly to Tier 4
- Return markdown content
Integration Points
- WebFetch Tool – Built-in Claude Code tool
- Bash Tool – For curl commands
- Browser Skill – For Playwright automation (requires pai-browser-skill)
- Bright Data MCP – For professional scraping
Related Documentation
Workflows/FourTierScrape.md– Complete workflow implementationINSTALL.md– Installation guideVERIFY.md– Verification checklist
Last Updated: 2026-01-14