cf-browser

📁 rarestg/rarestg-skills 📅 3 days ago

总安装量

周安装量

#55125

全站排名

安装命令

npx skills add https://github.com/rarestg/rarestg-skills --skill cf-browser

Agent 安装分布

openclaw 1

Skill 文档

Cloudflare Browser Rendering

Browse and scrape the web via Cloudflare’s Browser Rendering REST API. Every call is a single POST request â no browser setup, no Puppeteer scripts.

Prerequisites

Requires two env vars (confirm they’re set before making calls):

CF_ACCOUNT_ID â Cloudflare account ID
CF_API_TOKEN â API token with Browser Rendering – Edit permission

Helper script

Use cfbr.sh for all API calls. It handles auth headers and the base URL:

# JSON endpoints
cfbr.sh <endpoint> '<json_body>'

# Screenshot (binary) â optional third arg for output filename
cfbr.sh screenshot '<json_body>' output.png

Choosing an endpoint

Goal	Endpoint	When to use
Read page content for analysis	`markdown`	Default choice â clean, token-efficient
Extract specific elements	`scrape`	Know the CSS selectors for what you need
Extract structured data with AI	`json`	Need typed objects, don’t know exact selectors
Get full rendered DOM	`content`	Need raw HTML for parsing or debugging
Discover pages / crawl	`links`	Building a sitemap or finding subpages
Visual inspection	`screenshot`	Need to see the page layout or debug visually
DOM + visual in one shot	`snapshot`	Need both HTML and a screenshot

For full endpoint details and parameters, see api.md.

Scraping workflow

Follow this sequence when scraping a site for structured data (e.g. rental listings, product catalogs, job boards):

1. Reconnaissance â understand the page

Start with markdown to see what content is on the page and how it’s structured:

cfbr.sh markdown '{"url":"https://target-site.com/listings", "gotoOptions":{"waitUntil":"networkidle0"}}'

If the page is an SPA or loads content dynamically, networkidle0 ensures JS finishes executing. If you know a specific element that signals content is ready, use waitForSelector instead â it’s faster:

{"url":"...", "waitForSelector": ".listing-card"}

2. Discover structure â find the selectors

From the markdown/HTML, identify repeating patterns (listing cards, table rows, etc.) and their CSS selectors. If unclear from markdown alone, use screenshot to visually inspect:

cfbr.sh screenshot '{"url":"https://target-site.com/listings", "screenshotOptions":{"fullPage":true}, "gotoOptions":{"waitUntil":"networkidle0"}}' listings.png

3. Extract â pull structured data

Option A: CSS selectors (when you know the DOM structure)

cfbr.sh scrape '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "elements": [
    {"selector": ".listing-card .title"},
    {"selector": ".listing-card .price"},
    {"selector": ".listing-card .address"},
    {"selector": ".listing-card a"}
  ]
}'

The scrape endpoint returns text, html, attributes (including href), and position/dimensions for each match. Correlate results across selectors by index (first title matches first price, etc.).

Option B: AI extraction (when structure is complex or unknown)

cfbr.sh json '{
  "url": "https://target-site.com/listings",
  "gotoOptions": {"waitUntil": "networkidle0"},
  "prompt": "Extract all rental listings with title, price, address, bedrooms, and link",
  "response_format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "listings": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "address": {"type": "string"},
              "bedrooms": {"type": "string"},
              "url": {"type": "string"}
            },
            "required": ["title", "price"]
          }
        }
      }
    }
  }
}'

Prefer scrape when selectors are clear â it’s deterministic and free. Use json when the page structure is messy or you need semantic interpretation (incurs Workers AI charges).

4. Paginate â get all results

Use links to find pagination URLs:

cfbr.sh links '{"url":"https://target-site.com/listings"}'

Look for ?page=2, next, or load-more patterns. Repeat extraction for each page.

Infinite-scroll pages are a limitation â the API is stateless (one request = one browser session), so there’s no way to scroll, wait for new content to load, and then extract in a single call. For these pages, look for an underlying API or URL parameters (e.g. ?page=2, ?offset=20) that serve paginated data directly.

5. Handle obstacles

SPA / empty results â Add "gotoOptions": {"waitUntil": "networkidle0"} or "waitForSelector": "<selector>".

Slow pages â Increase timeout: "gotoOptions": {"timeout": 60000}.

Heavy pages â Strip unnecessary resources:

{"rejectResourceTypes": ["image", "stylesheet", "font", "media"]}

Auth-gated pages â Pass session cookies:

{"cookies": [{"name": "session", "value": "abc123", "domain": "target-site.com", "path": "/"}]}

Bot detection â Cloudflare Browser Rendering is always identified as a bot. The userAgent field changes what the site sees but will not bypass bot protection. If a site blocks the request, there is no workaround via this API.

Tips

markdown is the best default for content extraction â it’s clean, compact, and LLM-ready.
Always use networkidle0 or waitForSelector on any modern site. Without it you’ll get incomplete content.
rejectResourceTypes dramatically speeds up text-only operations. Always strip images/fonts/stylesheets when you only need text.
scrape results are ordered by DOM position â correlate across selectors by array index.
For large scraping jobs, process pages sequentially to stay within rate limits.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台