cf-browser
npx skills add https://github.com/rarestg/rarestg-skills --skill cf-browser
Agent 安装分布
Skill 文档
Cloudflare Browser Rendering
Browse and scrape the web via Cloudflare’s Browser Rendering REST API. Every call is a single POST request â no browser setup, no Puppeteer scripts.
Prerequisites
Requires two env vars (confirm they’re set before making calls):
CF_ACCOUNT_IDâ Cloudflare account IDCF_API_TOKENâ API token with Browser Rendering – Edit permission
Helper script
Use cfbr.sh for all API calls. It handles auth headers and the base URL:
# JSON endpoints
cfbr.sh <endpoint> '<json_body>'
# Screenshot (binary) â optional third arg for output filename
cfbr.sh screenshot '<json_body>' output.png
Choosing an endpoint
| Goal | Endpoint | When to use |
|---|---|---|
| Read page content for analysis | markdown |
Default choice â clean, token-efficient |
| Extract specific elements | scrape |
Know the CSS selectors for what you need |
| Extract structured data with AI | json |
Need typed objects, don’t know exact selectors |
| Get full rendered DOM | content |
Need raw HTML for parsing or debugging |
| Discover pages / crawl | links |
Building a sitemap or finding subpages |
| Visual inspection | screenshot |
Need to see the page layout or debug visually |
| DOM + visual in one shot | snapshot |
Need both HTML and a screenshot |
For full endpoint details and parameters, see api.md.
Scraping workflow
Follow this sequence when scraping a site for structured data (e.g. rental listings, product catalogs, job boards):
1. Reconnaissance â understand the page
Start with markdown to see what content is on the page and how it’s structured:
cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'
If the page is an SPA or loads content dynamically, networkidle0 ensures JS finishes executing. If you know a specific element that signals content is ready, use waitForSelector instead â it’s faster:
{"url":"...", "waitForSelector": ".listing-card"}
2. Discover structure â find the selectors
From the markdown/HTML, identify repeating patterns (listing cards, table rows, etc.) and their CSS selectors. If unclear from markdown alone, use screenshot to visually inspect:
cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png
3. Extract â pull structured data
Option A: CSS selectors (when you know the DOM structure)
cfbr.sh scrape '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"elements": [
{"selector": ".listing-card .title"},
{"selector": ".listing-card .price"},
{"selector": ".listing-card .address"},
{"selector": ".listing-card a"}
]
}'
The scrape endpoint returns text, html, attributes (including href), and position/dimensions for each match. Correlate results across selectors by index (first title matches first price, etc.).
Option B: AI extraction (when structure is complex or unknown)
cfbr.sh json '{
"url": "https://target-site.com/listings",
"gotoOptions": {"waitUntil": "networkidle0"},
"prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
"response_format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"listings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
"address": {"type": "string"},
"bedrooms": {"type": "string"},
"url": {"type": "string"}
},
"required": ["title", "price"]
}
}
}
}
}
}'
Prefer scrape when selectors are clear â it’s deterministic and free. Use json when the page structure is messy or you need semantic interpretation (incurs Workers AI charges).
4. Paginate â get all results
Use links to find pagination URLs:
cfbr.sh links '{"url":"https://target-site.com/listings"}'
Look for ?page=2, next, or load-more patterns. Repeat extraction for each page.
Infinite-scroll pages are a limitation â the API is stateless (one request = one browser session), so there’s no way to scroll, wait for new content to load, and then extract in a single call. For these pages, look for an underlying API or URL parameters (e.g. ?page=2, ?offset=20) that serve paginated data directly.
5. Handle obstacles
SPA / empty results â Add "gotoOptions": {"waitUntil": "networkidle0"} or "waitForSelector": "<selector>".
Slow pages â Increase timeout: "gotoOptions": {"timeout": 60000}.
Heavy pages â Strip unnecessary resources:
{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}
Auth-gated pages â Pass session cookies:
{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}
Bot detection â Cloudflare Browser Rendering is always identified as a bot. The userAgent field changes what the site sees but will not bypass bot protection. If a site blocks the request, there is no workaround via this API.
Tips
markdownis the best default for content extraction â it’s clean, compact, and LLM-ready.- Always use
networkidle0orwaitForSelectoron any modern site. Without it you’ll get incomplete content. rejectResourceTypesdramatically speeds up text-only operations. Always strip images/fonts/stylesheets when you only need text.scraperesults are ordered by DOM position â correlate across selectors by array index.- For large scraping jobs, process pages sequentially to stay within rate limits.