steel-browsing-skill

📁 steel-dev/steel-browsing-skill 📅 5 days ago
3
总安装量
3
周安装量
#55257
全站排名
安装命令
npx skills add https://github.com/steel-dev/steel-browsing-skill --skill steel-browsing-skill

Agent 安装分布

gemini-cli 3
claude-code 3
github-copilot 3
codex 3
kimi-cli 3
cursor 3

Skill 文档

Steel Browsing Skill (CDP-first)

This Skill enables reliable web browsing + automation via Steel.

Core operating principle (what we learned)

Default to CDP (Playwright/Puppeteer) for anything interactive. Use Computer API actions only as a tactical fallback for “visual” recovery (focus traps, stubborn overlays, browser-native-ish prompts, CAPTCHAs, etc.).

Default run profile (apply unless user overrides)

  • Assume anti-bot defenses may exist.
  • Prefer Steel session + CDP for most web tasks, even when scrape might appear possible.
  • Start sessions with anti-captcha/stealth settings enabled.
  • Keep one authoritative SESSION_ID and reuse it through retries.
  • Fail fast only after the bounded retry ladder (defined later) exhausts.

Session contract (before any task)

  • Use exactly one source of truth for state: SESSION_ID, SESSION_WS, SESSION_VIEWER_URL.
  • SESSION_ID comes from /v1/sessions.
  • SESSION_WS is websocketUrl returned by create-session.
  • SESSION_VIEWER_URL is for human verification only.
  • Reuse the same SESSION_ID and SESSION_WS for all retries.
  • Replace all session variables only when the previous session is proven gone (Session not found or confirmed hard expiration).
  • Maximum 1 replacement of a session per task unless explicit expiration is confirmed.

Golden template (default hard mode)

Use this as the default flow for any new site:

POST /v1/sessions
{
  "url": "https://target.site",
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "deviceConfig": { "device": "desktop" },
  "region": "iad",
  "useProxy": false
}
  • If using Playwright/Puppeteer, connect CDP with: websocketUrl + "&apiKey=" + encodeURIComponent(STEEL_API_KEY)
  • Run the interaction with selector-based waits and DOM verification.
  • If blocked/hung, do bounded fallback via POST /v1/sessions/{id}/computer (Esc / close overlay / small scroll), then retry once.
  • Always release in finally, even on failure.

Golden runbook (single-task template)

Use this exact sequence before each interactive task:

  1. Preflight
: "${STEEL_API_KEY:?missing STEEL_API_KEY}"
command -v curl >/dev/null || exit 1
command -v jq >/dev/null || exit 1
  1. Create one session and export state
RESPONSE=$(curl -sS -X POST https://api.steel.dev/v1/sessions \
  -H "Content-Type: application/json" \
  -H "steel-api-key: $STEEL_API_KEY" \
  --data-raw '{"url":"https://target.site","timeout":900000,"solveCaptcha":true,"stealthConfig":{"humanizeInteractions":true,"autoCaptchaSolving":true},"deviceConfig":{"device":"desktop"},"region":"iad","useProxy":false}')

SESSION_ID=$(echo "$RESPONSE" | jq -r .id)
SESSION_WS=$(echo "$RESPONSE" | jq -r --arg key "$STEEL_API_KEY" '.websocketUrl + "&apiKey=" + $key')
SESSION_VIEWER_URL=$(echo "$RESPONSE" | jq -r .sessionViewerUrl)
  1. Run CDP automation (single runtime path)
  • Use one runtime only (Playwright JS or Python Playwright).
  • Pass SESSION_WS and TARGET_URL as env vars.
  • On any recoverable exception, run one longer-timeout retry before fallback.
  1. Verify post-condition
  • URL changed to target destination OR
  • expected success selector visible OR
  • expected state/text changed.
  1. Release
curl -sS -X POST https://api.steel.dev/v1/sessions/"$SESSION_ID"/release \
  -H "steel-api-key: $STEEL_API_KEY" || true
  1. Bounded fallback
  • If blocked: one Computer recovery pass (take_screenshot, press_key ["Escape"], click outside, scroll) then one final CDP retry.
  • If still blocked: stop and report blocker reason.

Optional scripts for repetitive steps (non-mandatory)

Use these local helpers when you want fast, low-risk execution:

  • scripts/create_steel_session.sh – create session and export SESSION_ID, SESSION_WS, SESSION_VIEWER_URL, TARGET_URL.
  • scripts/release_steel_session.sh – idempotent release helper.
  • scripts/cdp_template.js – compact Playwright-CDP interaction scaffold.

Examples:

  • examples/runbook.md for one-shot copy/paste flow using the helper scripts.

Why:

  • CDP gives deterministic navigation + selectors + robust waits and verifications.
  • Computer actions are slower and fragile (coordinates), but excellent as an escape hatch.

Security & Setup

API key handling (mandatory policy)

  • Do not ask the user to paste API keys into chat.
  • Expect STEEL_API_KEY in the environment.

Example header (bash/curl):

-H "steel-api-key: $STEEL_API_KEY"

Base URL:

https://api.steel.dev

Runtime preflight (before first request)

  • if ! command -v jq >/dev/null; then install or fall back with safe shell JSON.
  • if ! command -v node >/dev/null; then switch to Python-only CDP path.
  • if ! command -v python >/dev/null; then use Node-only path.
  • if ! command -v playwright >/dev/null for chosen runtime, install before interaction or switch to Python Playwright package.
  • Validate at session creation time that timeout is present and includes anti-bot flags for interactive targets.
  • Set STEEL_API_KEY and never print request headers containing it.

Standard session variable setup

  • Set and reuse export SESSION_ID=<id>.
  • Set and reuse export SESSION_WS="<websocketUrl>&apiKey=${STEEL_API_KEY}".
  • Set and reuse export SESSION_VIEWER_URL=<sessionViewerUrl>.
  • Treat missing SESSION_WS as hard failure before CDP code execution.

Quick Decision Tree

Use Stateless endpoints when:

  • You only need page content, a screenshot, or a PDF
  • No login/multi-step flow required

✅ Use:

  • POST /v1/scrape
  • POST /v1/screenshot
  • POST /v1/pdf

Use Sessions when:

  • Login required
  • Multi-step interaction
  • Form submissions
  • JS-heavy apps
  • You need cookies/localStorage persistence

✅ Use:

  • POST /v1/sessions (create; always set timeout)
  • CDP (preferred) using websocketUrl from session response
  • POST /v1/sessions/{id}/computer (fallback / recovery)
  • GET /v1/sessions/{id}/context (cookies/storage)
  • POST /v1/sessions/{id}/release (always)

Mode 1: Stateless (One-shot)

1) Scrape

Use for clean text extraction and planning selectors.

Endpoint: POST /v1/scrape

Example:

{
  "url": "https://example.com",
  "format": ["markdown"],
  "screenshot": false,
  "pdf": false
}

Formats:

  • markdown (best for summarization)
  • cleaned_html (best for parsing + finding forms/selectors)
  • html (raw)

Tip: For form automation, scrape first and record:

  • input selectors (name=email, input[type=email], etc.)
  • submit button selector
  • success message text/element to verify completion

2) Screenshot

Endpoint: POST /v1/screenshot

Example:

{
  "url": "https://example.com",
  "fullPage": true
}

3) PDF

Endpoint: POST /v1/pdf

Example:

{
  "url": "https://example.com"
}

Mode 2: Sessions (Stateful)

Session lifecycle (critical)

Sessions expire if you don’t set a long enough timeout. Common failure symptom: Session ... not found.

Rule:

  • Always set timeout for anything non-trivial.
  • Track the active SESSION_ID in one place and don’t mix IDs.
  • Reuse the same session for retries; don’t create a new session for each selector tweak.
  • Bound session creation attempts (for example: max 2 per task) to avoid session sprawl.
  • Always release when done.
  • If release returns Session not found after successful work, treat it as already-ended/idempotent cleanup.

Create session

Endpoint: POST /v1/sessions

Minimal:

{
  "timeout": 600000
}

Common options:

{
  "url": "https://example.com",
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "deviceConfig": { "device": "desktop" },
  "region": "iad",
  "useProxy": false
}

For most sites, the minimum anti-bot-safe session is:

{
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "region": "iad"
}

The response typically includes:

  • id
  • websocketUrl (use for CDP)
  • sessionViewerUrl / debugUrl (use for human verification)

Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)

When to use CDP

Use CDP for:

  • navigation (goto)
  • selector-based clicks and fills
  • robust waits and assertions
  • reliable verification (URL/text/DOM)

How to connect

Use the websocketUrl returned by POST /v1/sessions. (Do not guess the URL pattern; Steel returns the correct one for your session.)

Important auth note from field use:

  • For some environments, connectOverCDP requires appending apiKey in the WS query string.
  • Safe default:
const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const browser = await chromium.connectOverCDP(wsUrl);

Stable CDP script pattern (copy-safe)

Use one runtime and export required variables.

import { chromium } from "playwright";

const wsUrl = `${process.env.SESSION_WS}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const target = process.env.TARGET_URL || "https://example.com";

(async () => {
  const browser = await chromium.connectOverCDP(wsUrl);
  const context = browser.contexts()[0];
  const page = context.pages()[0] || (await context.newPage());

  await page.goto(target, { waitUntil: "domcontentloaded", timeout: 60000 });
  await page.waitForSelector("body", { timeout: 30000 });
  // run deterministic interactions here
  await browser.close();
})();
import os
import asyncio
from playwright.async_api import async_playwright

async def run():
  ws_url = f"{os.environ['SESSION_WS']}&apiKey={os.environ['STEEL_API_KEY']}"
  target = os.environ.get("TARGET_URL", "https://example.com")
  async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(ws_url)
    context = browser.contexts[0]
    page = context.pages[0] if context.pages() else await context.new_page()
    await page.goto(target, wait_until="domcontentloaded", timeout=60000)
    await page.wait_for_selector("body", timeout=30000)
    # run deterministic interactions here
    await browser.close()

asyncio.run(run())

Recommended CDP workflow

  1. Create one session and keep its SESSION_ID as the single source of truth
  2. CDP handshake preflight (connectOverCDP) before deeper task logic
  3. page.goto(url) (or rely on session url at creation)
  4. Wait for stable UI (waitForLoadState, waitForSelector)
  5. Interact using selectors (fill, click)
  6. Verify success via DOM (preferred), or via scrape + known success text
  7. Release session

Failure handling inside CDP flow

  • If a CDP operation throws, wait and retry once with longer timeouts.
  • If the same selector fails twice, use one backup selector and retry once.
  • Do not recreate the session after a single transient timeout.

Example (Playwright-style pseudo)

// connect to session websocketUrl
// const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`
// const browser = await chromium.connectOverCDP(wsUrl)
// const page = (await browser.contexts()[0].pages())[0] ?? await browser.newPage()

await page.goto("https://example.com");
await page.waitForLoadState("domcontentloaded");
await page.fill('input[name="email"]', "test@test.com");
await page.click('button[type="submit"]');

// verify success
await page.waitForSelector("text=Thanks for subscribing", { timeout: 10000 });

Prefer CDP-native solutions before falling back to Computer actions:

  • JS dialogs: handle via dialog listeners
  • File uploads: setInputFiles (avoid OS file picker)
  • Permissions: grant at browser context level when possible

Step 2B (Fallback): Computer API (mouse/keyboard actions)

Use Computer actions when:

  • CDP selectors fail repeatedly and you need a visual “nudge”
  • You’re blocked by a stubborn overlay/focus trap
  • A browser-native-ish prompt is blocking progress
  • You need quick recovery (Esc, click outside, scroll, etc.)

Endpoint: POST /v1/sessions/{id}/computer

Hard-learned schema rules (avoid validation errors)

  • There is no navigate action.
  • press_key requires keys as an array (NOT key)
  • scroll uses delta_y / delta_x (NOT direction/amount)

Action reference (safe subset)

take_screenshot:
  { "action": "take_screenshot" }

click_mouse:
  { "action": "click_mouse", "button": "left", "coordinates": [x,y], "screenshot": true }

type_text:
  { "action": "type_text", "text": "...", "screenshot": true }

press_key:
  { "action": "press_key", "keys": ["Enter"], "screenshot": true }

scroll:
  { "action": "scroll", "delta_y": 800, "coordinates": [x,y], "screenshot": true }

wait:
  { "action": "wait", "duration": 2000, "screenshot": true }

Computer-first recovery playbook (fast unstick)

  1. take_screenshot
  2. press_key → ["Escape"]
  3. click outside modal area
  4. scroll a bit (delta_y)
  5. screenshot again
  6. retry CDP approach once the blocker is gone

Anti-bot / blocker detection and response

  • Cloudflare or anti-bot challenge wording appears (Just a moment, Checking your browser, etc.): wait, capture screenshot, then one Computer recovery pass.
  • Repeated click interception or overlay coverage persists: screenshot, press_key ["Escape"], click outside modal, scroll, screenshot.
  • Repeated wait-for-selector on same element: inspect blocker state first before changing selectors.

Navigating without CDP (fallback)

Since there is no navigate action, emulate it:

  1. Click address bar area (top center)
  2. type_text URL
  3. press_key ["Enter"]
  4. wait + screenshot

Step 3: CAPTCHA handling

Best default:

  • set solveCaptcha: true when creating a session

If stuck:

  • use viewer URL for human-in-the-loop
  • try computer recovery steps (scroll/hover/click checkbox) only if needed

Step 4: Extract session context (cookies/storage)

Endpoint: GET /v1/sessions/{id}/context

Use to:

  • persist login state
  • debug whether session stored cookies/localStorage
  • export state for follow-up tasks

Note: if cookies/storage are empty, it may mean:

  • you never actually logged in
  • the page is blocked
  • you’re in a different origin than expected
  • session expired and you queried the wrong ID

Step 5: Release session (always)

Endpoint: POST /v1/sessions/{id}/release

Rule:

  • Release as soon as you’ve verified success or determined you can’t proceed.
  • If release returns Session not found after success verification, treat as completed.

Recipes

Recipe: Newsletter signup (CDP-first)

  1. POST /v1/scrape to find:

    • email input selector
    • submit selector
    • success message text (for verification)
  2. Create session with long enough timeout:

{ "url": "https://site.com", "timeout": 600000 }
  1. Use CDP:
  • goto
  • fill
  • click submit
  • verify success text/element
  1. Release session.

Recipe: Login flow (CDP-first)

  1. Create session with timeout + optionally solveCaptcha
  2. CDP:
  • goto(login)
  • fill(username/password)
  • click(sign in)
  • wait for logged-in selector
  1. Verify via DOM (profile avatar / logout button / dashboard URL)
  2. Optionally GET /context to confirm cookies exist
  3. Release

Recipe: Stuck on an overlay (hybrid)

  1. CDP attempts fail due to overlay/click interception
  2. Use Computer API:
  • screenshot
  • press Esc
  • click close “X”
  • scroll slightly
  • screenshot
  1. Return to CDP and continue with selectors
  2. Verify + release

Troubleshooting (Error → Fix)

invalid_union / “No matching discriminator”

Cause: unsupported action or wrong payload shape. Fix:

  • Use only the documented Computer actions
  • Remove any navigate action usage

Invalid input: expected array … path: keys

Cause: you used key instead of keys. Fix:

{ "action": "press_key", "keys": ["Enter"] }

Scroll does nothing / “Scrolled up by 0 at (0,0)”

Cause: using direction/amount or missing delta_y. Fix:

{ "action": "scroll", "delta_y": 800, "coordinates": [960, 540] }

Session ... not found

Cause: session expired/released OR you used an old ID. Fix:

  • Create a new session with a longer timeout
  • Update the stored SESSION_ID everywhere
  • Don’t mix multiple sessions unless necessary
  • If this happens on release after successful verification, treat cleanup as already complete

connectOverCDP ... 502 Bad Gateway (to wss://connect.steel.dev/)

Cause: WS connection missing required auth in query string in this environment. Fix:

const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
await chromium.connectOverCDP(wsUrl);

Curl errors like “blank argument where content is expected”

Cause: broken shell quoting / multiline JSON issues. Fix:

  • Use one-line JSON with --data-raw
  • Or build payload with jq -n and pass it safely

SyntaxError / malformed page.evaluate script

Cause: mixed quoting or invalid JS embedded in shell/JSON. Fix:

  • Keep JS scripts short and pass as raw heredocs or files.
  • Validate escaping before embedding script text in one-liners.
  • Fall back to one clean script per run instead of incremental inline patches.

Cannot find module 'playwright' or runtime import failures

Cause: missing playwright package in the execution environment. Fix:

  • Use one runtime per task and confirm module availability first.
  • Install dependency before running or switch to a Python Playwright path consistently.

write_stdin failed: stdin is closed

Cause: writing to a terminated subprocess. Fix:

  • Use session lifecycle to avoid interactive drift.
  • Treat closed stdin as terminal for that branch; proceed with command-based rerun.

Best Practices (to prevent the exact failures from the logs)

CDP-first by default

  • Use CDP for navigation + selectors + verification
  • Only use Computer actions as an escape hatch

Always verify

For “submit” tasks:

  • Prefer DOM verification (CDP wait for success)
  • Or re-scrape and look for success text / state change
  • Don’t claim success based on “click happened”

Verification contract:

  • Require one of the following before completion: expected URL change.
  • Require one of the following before completion: visible success element.
  • Require one of the following before completion: expected text or state change.
  • If no post-condition is met, continue the retry ladder or return a blocker reason.

Bound your retries (avoid spirals)

Suggested retry ladder:

  1. CDP attempt (selectors + waits)
  2. CDP attempt (adjust selectors, wait longer)
  3. Computer recovery (Esc/click outside/scroll)
  4. One final CDP attempt If still blocked: stop and report what’s blocking progress.

Standardized stop conditions:

  • No more than 4 total retry loops per task.
  • Session replacement only if expiration is confirmed (Session not found).
  • At most one Computer recovery pass unless a new blocker category is observed.

Session hygiene

  • Set timeout
  • Reuse a single session per task whenever possible
  • Release sessions
  • Keep a single authoritative SESSION_ID
  • Treat release -> Session not found as non-fatal if success was already verified

Secret hygiene

  • Never request/paste keys
  • Never echo keys in logs
  • Prefer env vars

Summary

  • Stateless endpoints for quick extraction/screenshots/PDFs.
  • Sessions + CDP for reliable multi-step automation.
  • Computer actions as a fallback to break through blockers or recover from stuck UI.
  • Always verify outcomes and manage session lifecycles correctly.