steel-browsing-skill

📁 steel-dev/steel-browsing-skill 📅 5 days ago

总安装量

周安装量

#55257

全站排名

安装命令

npx skills add https://github.com/steel-dev/steel-browsing-skill --skill steel-browsing-skill

Agent 安装分布

gemini-cli 3

claude-code 3

github-copilot 3

codex 3

kimi-cli 3

cursor 3

Skill 文档

Steel Browsing Skill (CDP-first)

This Skill enables reliable web browsing + automation via Steel.

Core operating principle (what we learned)

Default to CDP (Playwright/Puppeteer) for anything interactive. Use Computer API actions only as a tactical fallback for âvisualâ recovery (focus traps, stubborn overlays, browser-native-ish prompts, CAPTCHAs, etc.).

Default run profile (apply unless user overrides)

Assume anti-bot defenses may exist.
Prefer Steel session + CDP for most web tasks, even when scrape might appear possible.
Start sessions with anti-captcha/stealth settings enabled.
Keep one authoritative SESSION_ID and reuse it through retries.
Fail fast only after the bounded retry ladder (defined later) exhausts.

Session contract (before any task)

Use exactly one source of truth for state: SESSION_ID, SESSION_WS, SESSION_VIEWER_URL.
SESSION_ID comes from /v1/sessions.
SESSION_WS is websocketUrl returned by create-session.
SESSION_VIEWER_URL is for human verification only.
Reuse the same SESSION_ID and SESSION_WS for all retries.
Replace all session variables only when the previous session is proven gone (Session not found or confirmed hard expiration).
Maximum 1 replacement of a session per task unless explicit expiration is confirmed.

Golden template (default hard mode)

Use this as the default flow for any new site:

POST /v1/sessions
{
  "url": "https://target.site",
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "deviceConfig": { "device": "desktop" },
  "region": "iad",
  "useProxy": false
}

If using Playwright/Puppeteer, connect CDP with: websocketUrl + "&apiKey=" + encodeURIComponent(STEEL_API_KEY)
Run the interaction with selector-based waits and DOM verification.
If blocked/hung, do bounded fallback via POST /v1/sessions/{id}/computer (Esc / close overlay / small scroll), then retry once.
Always release in finally, even on failure.

Golden runbook (single-task template)

Use this exact sequence before each interactive task:

Preflight

: "${STEEL_API_KEY:?missing STEEL_API_KEY}"
command -v curl >/dev/null || exit 1
command -v jq >/dev/null || exit 1

Create one session and export state

RESPONSE=$(curl -sS -X POST https://api.steel.dev/v1/sessions \
  -H "Content-Type: application/json" \
  -H "steel-api-key: $STEEL_API_KEY" \
  --data-raw '{"url":"https://target.site","timeout":900000,"solveCaptcha":true,"stealthConfig":{"humanizeInteractions":true,"autoCaptchaSolving":true},"deviceConfig":{"device":"desktop"},"region":"iad","useProxy":false}')

SESSION_ID=$(echo "$RESPONSE" | jq -r .id)
SESSION_WS=$(echo "$RESPONSE" | jq -r --arg key "$STEEL_API_KEY" '.websocketUrl + "&apiKey=" + $key')
SESSION_VIEWER_URL=$(echo "$RESPONSE" | jq -r .sessionViewerUrl)

Run CDP automation (single runtime path)

Use one runtime only (Playwright JS or Python Playwright).
Pass SESSION_WS and TARGET_URL as env vars.
On any recoverable exception, run one longer-timeout retry before fallback.

Verify post-condition

URL changed to target destination OR
expected success selector visible OR
expected state/text changed.

Release

curl -sS -X POST https://api.steel.dev/v1/sessions/"$SESSION_ID"/release \
  -H "steel-api-key: $STEEL_API_KEY" || true

Bounded fallback

If blocked: one Computer recovery pass (take_screenshot, press_key ["Escape"], click outside, scroll) then one final CDP retry.
If still blocked: stop and report blocker reason.

Optional scripts for repetitive steps (non-mandatory)

Use these local helpers when you want fast, low-risk execution:

scripts/create_steel_session.sh â create session and export SESSION_ID, SESSION_WS, SESSION_VIEWER_URL, TARGET_URL.
scripts/release_steel_session.sh â idempotent release helper.
scripts/cdp_template.js â compact Playwright-CDP interaction scaffold.

Examples:

examples/runbook.md for one-shot copy/paste flow using the helper scripts.

Why:

CDP gives deterministic navigation + selectors + robust waits and verifications.
Computer actions are slower and fragile (coordinates), but excellent as an escape hatch.

Security & Setup

API key handling (mandatory policy)

Do not ask the user to paste API keys into chat.
Expect STEEL_API_KEY in the environment.

Example header (bash/curl):

-H "steel-api-key: $STEEL_API_KEY"

Base URL:

https://api.steel.dev

Runtime preflight (before first request)

if ! command -v jq >/dev/null; then install or fall back with safe shell JSON.
if ! command -v node >/dev/null; then switch to Python-only CDP path.
if ! command -v python >/dev/null; then use Node-only path.
if ! command -v playwright >/dev/null for chosen runtime, install before interaction or switch to Python Playwright package.
Validate at session creation time that timeout is present and includes anti-bot flags for interactive targets.
Set STEEL_API_KEY and never print request headers containing it.

Standard session variable setup

Set and reuse export SESSION_ID=<id>.
Set and reuse export SESSION_WS="<websocketUrl>&apiKey=${STEEL_API_KEY}".
Set and reuse export SESSION_VIEWER_URL=<sessionViewerUrl>.
Treat missing SESSION_WS as hard failure before CDP code execution.

Quick Decision Tree

Use Stateless endpoints when:

You only need page content, a screenshot, or a PDF
No login/multi-step flow required

â Use:

POST /v1/scrape
POST /v1/screenshot
POST /v1/pdf

Use Sessions when:

Login required
Multi-step interaction
Form submissions
JS-heavy apps
You need cookies/localStorage persistence

â Use:

POST /v1/sessions (create; always set timeout)
CDP (preferred) using websocketUrl from session response
POST /v1/sessions/{id}/computer (fallback / recovery)
GET /v1/sessions/{id}/context (cookies/storage)
POST /v1/sessions/{id}/release (always)

Mode 1: Stateless (One-shot)

1) Scrape

Use for clean text extraction and planning selectors.

Endpoint: POST /v1/scrape

Example:

{
  "url": "https://example.com",
  "format": ["markdown"],
  "screenshot": false,
  "pdf": false
}

Formats:

markdown (best for summarization)
cleaned_html (best for parsing + finding forms/selectors)
html (raw)

Tip: For form automation, scrape first and record:

input selectors (name=email, input[type=email], etc.)
submit button selector
success message text/element to verify completion

2) Screenshot

Endpoint: POST /v1/screenshot

Example:

{
  "url": "https://example.com",
  "fullPage": true
}

3) PDF

Endpoint: POST /v1/pdf

Example:

{
  "url": "https://example.com"
}

Mode 2: Sessions (Stateful)

Session lifecycle (critical)

Sessions expire if you donât set a long enough timeout. Common failure symptom: Session ... not found.

Rule:

Always set timeout for anything non-trivial.
Track the active SESSION_ID in one place and donât mix IDs.
Reuse the same session for retries; donât create a new session for each selector tweak.
Bound session creation attempts (for example: max 2 per task) to avoid session sprawl.
Always release when done.
If release returns Session not found after successful work, treat it as already-ended/idempotent cleanup.

Create session

Endpoint: POST /v1/sessions

Minimal:

{
  "timeout": 600000
}

Common options:

{
  "url": "https://example.com",
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "deviceConfig": { "device": "desktop" },
  "region": "iad",
  "useProxy": false
}

For most sites, the minimum anti-bot-safe session is:

{
  "timeout": 900000,
  "solveCaptcha": true,
  "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
  "region": "iad"
}

The response typically includes:

id
websocketUrl (use for CDP)
sessionViewerUrl / debugUrl (use for human verification)

Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)

When to use CDP

Use CDP for:

navigation (goto)
selector-based clicks and fills
robust waits and assertions
reliable verification (URL/text/DOM)

How to connect

Use the websocketUrl returned by POST /v1/sessions. (Do not guess the URL pattern; Steel returns the correct one for your session.)

Important auth note from field use:

For some environments, connectOverCDP requires appending apiKey in the WS query string.
Safe default:

const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const browser = await chromium.connectOverCDP(wsUrl);

Stable CDP script pattern (copy-safe)

Use one runtime and export required variables.

import { chromium } from "playwright";

const wsUrl = `${process.env.SESSION_WS}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const target = process.env.TARGET_URL || "https://example.com";

(async () => {
  const browser = await chromium.connectOverCDP(wsUrl);
  const context = browser.contexts()[0];
  const page = context.pages()[0] || (await context.newPage());

  await page.goto(target, { waitUntil: "domcontentloaded", timeout: 60000 });
  await page.waitForSelector("body", { timeout: 30000 });
  // run deterministic interactions here
  await browser.close();
})();

import os
import asyncio
from playwright.async_api import async_playwright

async def run():
  ws_url = f"{os.environ['SESSION_WS']}&apiKey={os.environ['STEEL_API_KEY']}"
  target = os.environ.get("TARGET_URL", "https://example.com")
  async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(ws_url)
    context = browser.contexts[0]
    page = context.pages[0] if context.pages() else await context.new_page()
    await page.goto(target, wait_until="domcontentloaded", timeout=60000)
    await page.wait_for_selector("body", timeout=30000)
    # run deterministic interactions here
    await browser.close()

asyncio.run(run())

Recommended CDP workflow

Create one session and keep its SESSION_ID as the single source of truth
CDP handshake preflight (connectOverCDP) before deeper task logic
page.goto(url) (or rely on session url at creation)
Wait for stable UI (waitForLoadState, waitForSelector)
Interact using selectors (fill, click)
Verify success via DOM (preferred), or via scrape + known success text
Release session

Failure handling inside CDP flow

If a CDP operation throws, wait and retry once with longer timeouts.
If the same selector fails twice, use one backup selector and retry once.
Do not recreate the session after a single transient timeout.

Example (Playwright-style pseudo)

// connect to session websocketUrl
// const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`
// const browser = await chromium.connectOverCDP(wsUrl)
// const page = (await browser.contexts()[0].pages())[0] ?? await browser.newPage()

await page.goto("https://example.com");
await page.waitForLoadState("domcontentloaded");
await page.fill('input[name="email"]', "test@test.com");
await page.click('button[type="submit"]');

// verify success
await page.waitForSelector("text=Thanks for subscribing", { timeout: 10000 });

Prefer CDP-native solutions before falling back to Computer actions:

JS dialogs: handle via dialog listeners

File uploads: setInputFiles (avoid OS file picker)

Permissions: grant at browser context level when possible

Step 2B (Fallback): Computer API (mouse/keyboard actions)

Use Computer actions when:

CDP selectors fail repeatedly and you need a visual ânudgeâ
Youâre blocked by a stubborn overlay/focus trap
A browser-native-ish prompt is blocking progress
You need quick recovery (Esc, click outside, scroll, etc.)

Endpoint: POST /v1/sessions/{id}/computer

Hard-learned schema rules (avoid validation errors)

There is no navigate action.
press_key requires keys as an array (NOT key)
scroll uses delta_y / delta_x (NOT direction/amount)

Action reference (safe subset)

take_screenshot:
  { "action": "take_screenshot" }

click_mouse:
  { "action": "click_mouse", "button": "left", "coordinates": [x,y], "screenshot": true }

type_text:
  { "action": "type_text", "text": "...", "screenshot": true }

press_key:
  { "action": "press_key", "keys": ["Enter"], "screenshot": true }

scroll:
  { "action": "scroll", "delta_y": 800, "coordinates": [x,y], "screenshot": true }

wait:
  { "action": "wait", "duration": 2000, "screenshot": true }

Computer-first recovery playbook (fast unstick)

take_screenshot
press_key â ["Escape"]
click outside modal area
scroll a bit (delta_y)
screenshot again
retry CDP approach once the blocker is gone

Anti-bot / blocker detection and response

Cloudflare or anti-bot challenge wording appears (Just a moment, Checking your browser, etc.): wait, capture screenshot, then one Computer recovery pass.
Repeated click interception or overlay coverage persists: screenshot, press_key ["Escape"], click outside modal, scroll, screenshot.
Repeated wait-for-selector on same element: inspect blocker state first before changing selectors.

Navigating without CDP (fallback)

Since there is no navigate action, emulate it:

Click address bar area (top center)
type_text URL
press_key ["Enter"]
wait + screenshot

Step 3: CAPTCHA handling

Best default:

set solveCaptcha: true when creating a session

If stuck:

use viewer URL for human-in-the-loop
try computer recovery steps (scroll/hover/click checkbox) only if needed

Step 4: Extract session context (cookies/storage)

Endpoint: GET /v1/sessions/{id}/context

Use to:

persist login state
debug whether session stored cookies/localStorage
export state for follow-up tasks

Note: if cookies/storage are empty, it may mean:

you never actually logged in
the page is blocked
youâre in a different origin than expected
session expired and you queried the wrong ID

Step 5: Release session (always)

Endpoint: POST /v1/sessions/{id}/release

Rule:

Release as soon as youâve verified success or determined you canât proceed.
If release returns Session not found after success verification, treat as completed.

Recipes

Recipe: Newsletter signup (CDP-first)

POST /v1/scrape to find:
- email input selector
- submit selector
- success message text (for verification)
Create session with long enough timeout:

{ "url": "https://site.com", "timeout": 600000 }

Use CDP:

goto
fill
click submit
verify success text/element

Release session.

Recipe: Login flow (CDP-first)

Create session with timeout + optionally solveCaptcha
CDP:

goto(login)
fill(username/password)
click(sign in)
wait for logged-in selector

Verify via DOM (profile avatar / logout button / dashboard URL)
Optionally GET /context to confirm cookies exist
Release

Recipe: Stuck on an overlay (hybrid)

CDP attempts fail due to overlay/click interception
Use Computer API:

screenshot
press Esc
click close âXâ
scroll slightly
screenshot

Return to CDP and continue with selectors
Verify + release

Troubleshooting (Error â Fix)

`invalid_union` / âNo matching discriminatorâ

Cause: unsupported action or wrong payload shape. Fix:

Use only the documented Computer actions
Remove any navigate action usage

`Invalid input: expected array â¦ path: keys`

Cause: you used key instead of keys. Fix:

{ "action": "press_key", "keys": ["Enter"] }

Scroll does nothing / âScrolled up by 0 at (0,0)â

Cause: using direction/amount or missing delta_y. Fix:

{ "action": "scroll", "delta_y": 800, "coordinates": [960, 540] }

`Session ... not found`

Cause: session expired/released OR you used an old ID. Fix:

Create a new session with a longer timeout
Update the stored SESSION_ID everywhere
Donât mix multiple sessions unless necessary
If this happens on release after successful verification, treat cleanup as already complete

`connectOverCDP ... 502 Bad Gateway` (to `wss://connect.steel.dev/`)

Cause: WS connection missing required auth in query string in this environment. Fix:

const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
await chromium.connectOverCDP(wsUrl);

Curl errors like âblank argument where content is expectedâ

Cause: broken shell quoting / multiline JSON issues. Fix:

Use one-line JSON with --data-raw
Or build payload with jq -n and pass it safely

`SyntaxError` / malformed `page.evaluate` script

Cause: mixed quoting or invalid JS embedded in shell/JSON. Fix:

Keep JS scripts short and pass as raw heredocs or files.
Validate escaping before embedding script text in one-liners.
Fall back to one clean script per run instead of incremental inline patches.

`Cannot find module 'playwright'` or runtime import failures

Cause: missing playwright package in the execution environment. Fix:

Use one runtime per task and confirm module availability first.
Install dependency before running or switch to a Python Playwright path consistently.

`write_stdin failed: stdin is closed`

Cause: writing to a terminated subprocess. Fix:

Use session lifecycle to avoid interactive drift.
Treat closed stdin as terminal for that branch; proceed with command-based rerun.

Best Practices (to prevent the exact failures from the logs)

CDP-first by default

Use CDP for navigation + selectors + verification
Only use Computer actions as an escape hatch

Always verify

For âsubmitâ tasks:

Prefer DOM verification (CDP wait for success)
Or re-scrape and look for success text / state change
Donât claim success based on âclick happenedâ

Verification contract:

Require one of the following before completion: expected URL change.
Require one of the following before completion: visible success element.
Require one of the following before completion: expected text or state change.
If no post-condition is met, continue the retry ladder or return a blocker reason.

Bound your retries (avoid spirals)

Suggested retry ladder:

CDP attempt (selectors + waits)
CDP attempt (adjust selectors, wait longer)
Computer recovery (Esc/click outside/scroll)
One final CDP attempt If still blocked: stop and report whatâs blocking progress.

Standardized stop conditions:

No more than 4 total retry loops per task.
Session replacement only if expiration is confirmed (Session not found).
At most one Computer recovery pass unless a new blocker category is observed.

Session hygiene

Set timeout
Reuse a single session per task whenever possible
Release sessions
Keep a single authoritative SESSION_ID
Treat release -> Session not found as non-fatal if success was already verified

Secret hygiene

Never request/paste keys
Never echo keys in logs
Prefer env vars

Summary

Stateless endpoints for quick extraction/screenshots/PDFs.
Sessions + CDP for reliable multi-step automation.
Computer actions as a fallback to break through blockers or recover from stuck UI.
Always verify outcomes and manage session lifecycles correctly.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台

steel-browsing-skill

Agent 安装分布

Skill 文档

Steel Browsing Skill (CDP-first)

Core operating principle (what we learned)

Default run profile (apply unless user overrides)

Session contract (before any task)

Golden template (default hard mode)

Golden runbook (single-task template)

Optional scripts for repetitive steps (non-mandatory)

Security & Setup

API key handling (mandatory policy)

Runtime preflight (before first request)

Standard session variable setup

Quick Decision Tree

Use Stateless endpoints when:

Use Sessions when:

Mode 1: Stateless (One-shot)

1) Scrape

2) Screenshot

3) PDF

Mode 2: Sessions (Stateful)

Session lifecycle (critical)

Create session

Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)

When to use CDP

How to connect

Stable CDP script pattern (copy-safe)

Recommended CDP workflow

Failure handling inside CDP flow

Example (Playwright-style pseudo)

Step 2B (Fallback): Computer API (mouse/keyboard actions)

Hard-learned schema rules (avoid validation errors)

Action reference (safe subset)

Computer-first recovery playbook (fast unstick)

Anti-bot / blocker detection and response

Navigating without CDP (fallback)

Step 3: CAPTCHA handling

Step 4: Extract session context (cookies/storage)

Step 5: Release session (always)

Recipes

Recipe: Newsletter signup (CDP-first)

Recipe: Login flow (CDP-first)

Recipe: Stuck on an overlay (hybrid)

Troubleshooting (Error â Fix)

invalid_union / âNo matching discriminatorâ

Invalid input: expected array â¦ path: keys

Scroll does nothing / âScrolled up by 0 at (0,0)â

Session ... not found

connectOverCDP ... 502 Bad Gateway (to wss://connect.steel.dev/)

Curl errors like âblank argument where content is expectedâ

SyntaxError / malformed page.evaluate script

Cannot find module 'playwright' or runtime import failures

write_stdin failed: stdin is closed

Best Practices (to prevent the exact failures from the logs)

CDP-first by default

Always verify

Bound your retries (avoid spirals)

Session hygiene

Secret hygiene

Summary

Troubleshooting (Error â Fix)

`invalid_union` / âNo matching discriminatorâ

`Invalid input: expected array â¦ path: keys`

Scroll does nothing / âScrolled up by 0 at (0,0)â

`Session ... not found`

`connectOverCDP ... 502 Bad Gateway` (to `wss://connect.steel.dev/`)

Curl errors like âblank argument where content is expectedâ

`SyntaxError` / malformed `page.evaluate` script

`Cannot find module 'playwright'` or runtime import failures

`write_stdin failed: stdin is closed`