steel-browsing-skill
npx skills add https://github.com/steel-dev/steel-browsing-skill --skill steel-browsing-skill
Agent 安装分布
Skill 文档
Steel Browsing Skill (CDP-first)
This Skill enables reliable web browsing + automation via Steel.
Core operating principle (what we learned)
Default to CDP (Playwright/Puppeteer) for anything interactive. Use Computer API actions only as a tactical fallback for âvisualâ recovery (focus traps, stubborn overlays, browser-native-ish prompts, CAPTCHAs, etc.).
Default run profile (apply unless user overrides)
- Assume anti-bot defenses may exist.
- Prefer Steel session + CDP for most web tasks, even when scrape might appear possible.
- Start sessions with anti-captcha/stealth settings enabled.
- Keep one authoritative
SESSION_IDand reuse it through retries. - Fail fast only after the bounded retry ladder (defined later) exhausts.
Session contract (before any task)
- Use exactly one source of truth for state:
SESSION_ID,SESSION_WS,SESSION_VIEWER_URL. SESSION_IDcomes from/v1/sessions.SESSION_WSiswebsocketUrlreturned by create-session.SESSION_VIEWER_URLis for human verification only.- Reuse the same
SESSION_IDandSESSION_WSfor all retries. - Replace all session variables only when the previous session is proven gone (
Session not foundor confirmed hard expiration). - Maximum 1 replacement of a session per task unless explicit expiration is confirmed.
Golden template (default hard mode)
Use this as the default flow for any new site:
POST /v1/sessions
{
"url": "https://target.site",
"timeout": 900000,
"solveCaptcha": true,
"stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
"deviceConfig": { "device": "desktop" },
"region": "iad",
"useProxy": false
}
- If using Playwright/Puppeteer, connect CDP with:
websocketUrl + "&apiKey=" + encodeURIComponent(STEEL_API_KEY) - Run the interaction with selector-based waits and DOM verification.
- If blocked/hung, do bounded fallback via
POST /v1/sessions/{id}/computer(Esc / close overlay / small scroll), then retry once. - Always release in
finally, even on failure.
Golden runbook (single-task template)
Use this exact sequence before each interactive task:
- Preflight
: "${STEEL_API_KEY:?missing STEEL_API_KEY}"
command -v curl >/dev/null || exit 1
command -v jq >/dev/null || exit 1
- Create one session and export state
RESPONSE=$(curl -sS -X POST https://api.steel.dev/v1/sessions \
-H "Content-Type: application/json" \
-H "steel-api-key: $STEEL_API_KEY" \
--data-raw '{"url":"https://target.site","timeout":900000,"solveCaptcha":true,"stealthConfig":{"humanizeInteractions":true,"autoCaptchaSolving":true},"deviceConfig":{"device":"desktop"},"region":"iad","useProxy":false}')
SESSION_ID=$(echo "$RESPONSE" | jq -r .id)
SESSION_WS=$(echo "$RESPONSE" | jq -r --arg key "$STEEL_API_KEY" '.websocketUrl + "&apiKey=" + $key')
SESSION_VIEWER_URL=$(echo "$RESPONSE" | jq -r .sessionViewerUrl)
- Run CDP automation (single runtime path)
- Use one runtime only (Playwright JS or Python Playwright).
- Pass
SESSION_WSandTARGET_URLas env vars. - On any recoverable exception, run one longer-timeout retry before fallback.
- Verify post-condition
- URL changed to target destination OR
- expected success selector visible OR
- expected state/text changed.
- Release
curl -sS -X POST https://api.steel.dev/v1/sessions/"$SESSION_ID"/release \
-H "steel-api-key: $STEEL_API_KEY" || true
- Bounded fallback
- If blocked: one Computer recovery pass (
take_screenshot,press_key ["Escape"], click outside,scroll) then one final CDP retry. - If still blocked: stop and report blocker reason.
Optional scripts for repetitive steps (non-mandatory)
Use these local helpers when you want fast, low-risk execution:
scripts/create_steel_session.shâ create session and exportSESSION_ID,SESSION_WS,SESSION_VIEWER_URL,TARGET_URL.scripts/release_steel_session.shâ idempotent release helper.scripts/cdp_template.jsâ compact Playwright-CDP interaction scaffold.
Examples:
examples/runbook.mdfor one-shot copy/paste flow using the helper scripts.
Why:
- CDP gives deterministic navigation + selectors + robust waits and verifications.
- Computer actions are slower and fragile (coordinates), but excellent as an escape hatch.
Security & Setup
API key handling (mandatory policy)
- Do not ask the user to paste API keys into chat.
- Expect
STEEL_API_KEYin the environment.
Example header (bash/curl):
-H "steel-api-key: $STEEL_API_KEY"
Base URL:
https://api.steel.dev
Runtime preflight (before first request)
if ! command -v jq >/dev/null; theninstall or fall back with safe shell JSON.if ! command -v node >/dev/null; thenswitch to Python-only CDP path.if ! command -v python >/dev/null; thenuse Node-only path.if ! command -v playwright >/dev/nullfor chosen runtime, install before interaction or switch to Python Playwright package.- Validate at session creation time that
timeoutis present and includes anti-bot flags for interactive targets. - Set
STEEL_API_KEYand never print request headers containing it.
Standard session variable setup
- Set and reuse
export SESSION_ID=<id>. - Set and reuse
export SESSION_WS="<websocketUrl>&apiKey=${STEEL_API_KEY}". - Set and reuse
export SESSION_VIEWER_URL=<sessionViewerUrl>. - Treat missing
SESSION_WSas hard failure before CDP code execution.
Quick Decision Tree
Use Stateless endpoints when:
- You only need page content, a screenshot, or a PDF
- No login/multi-step flow required
â Use:
POST /v1/scrapePOST /v1/screenshotPOST /v1/pdf
Use Sessions when:
- Login required
- Multi-step interaction
- Form submissions
- JS-heavy apps
- You need cookies/localStorage persistence
â Use:
POST /v1/sessions(create; always settimeout)- CDP (preferred) using
websocketUrlfrom session response POST /v1/sessions/{id}/computer(fallback / recovery)GET /v1/sessions/{id}/context(cookies/storage)POST /v1/sessions/{id}/release(always)
Mode 1: Stateless (One-shot)
1) Scrape
Use for clean text extraction and planning selectors.
Endpoint: POST /v1/scrape
Example:
{
"url": "https://example.com",
"format": ["markdown"],
"screenshot": false,
"pdf": false
}
Formats:
markdown(best for summarization)cleaned_html(best for parsing + finding forms/selectors)html(raw)
Tip: For form automation, scrape first and record:
- input selectors (
name=email,input[type=email], etc.) - submit button selector
- success message text/element to verify completion
2) Screenshot
Endpoint: POST /v1/screenshot
Example:
{
"url": "https://example.com",
"fullPage": true
}
3) PDF
Endpoint: POST /v1/pdf
Example:
{
"url": "https://example.com"
}
Mode 2: Sessions (Stateful)
Session lifecycle (critical)
Sessions expire if you donât set a long enough timeout.
Common failure symptom: Session ... not found.
Rule:
- Always set
timeoutfor anything non-trivial. - Track the active
SESSION_IDin one place and donât mix IDs. - Reuse the same session for retries; donât create a new session for each selector tweak.
- Bound session creation attempts (for example: max 2 per task) to avoid session sprawl.
- Always
releasewhen done. - If
releasereturnsSession not foundafter successful work, treat it as already-ended/idempotent cleanup.
Create session
Endpoint: POST /v1/sessions
Minimal:
{
"timeout": 600000
}
Common options:
{
"url": "https://example.com",
"timeout": 900000,
"solveCaptcha": true,
"stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
"deviceConfig": { "device": "desktop" },
"region": "iad",
"useProxy": false
}
For most sites, the minimum anti-bot-safe session is:
{
"timeout": 900000,
"solveCaptcha": true,
"stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true },
"region": "iad"
}
The response typically includes:
idwebsocketUrl(use for CDP)sessionViewerUrl/debugUrl(use for human verification)
Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)
When to use CDP
Use CDP for:
- navigation (
goto) - selector-based clicks and fills
- robust waits and assertions
- reliable verification (URL/text/DOM)
How to connect
Use the websocketUrl returned by POST /v1/sessions.
(Do not guess the URL pattern; Steel returns the correct one for your session.)
Important auth note from field use:
- For some environments,
connectOverCDPrequires appendingapiKeyin the WS query string. - Safe default:
const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const browser = await chromium.connectOverCDP(wsUrl);
Stable CDP script pattern (copy-safe)
Use one runtime and export required variables.
import { chromium } from "playwright";
const wsUrl = `${process.env.SESSION_WS}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
const target = process.env.TARGET_URL || "https://example.com";
(async () => {
const browser = await chromium.connectOverCDP(wsUrl);
const context = browser.contexts()[0];
const page = context.pages()[0] || (await context.newPage());
await page.goto(target, { waitUntil: "domcontentloaded", timeout: 60000 });
await page.waitForSelector("body", { timeout: 30000 });
// run deterministic interactions here
await browser.close();
})();
import os
import asyncio
from playwright.async_api import async_playwright
async def run():
ws_url = f"{os.environ['SESSION_WS']}&apiKey={os.environ['STEEL_API_KEY']}"
target = os.environ.get("TARGET_URL", "https://example.com")
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(ws_url)
context = browser.contexts[0]
page = context.pages[0] if context.pages() else await context.new_page()
await page.goto(target, wait_until="domcontentloaded", timeout=60000)
await page.wait_for_selector("body", timeout=30000)
# run deterministic interactions here
await browser.close()
asyncio.run(run())
Recommended CDP workflow
- Create one session and keep its
SESSION_IDas the single source of truth - CDP handshake preflight (
connectOverCDP) before deeper task logic page.goto(url)(or rely on sessionurlat creation)- Wait for stable UI (
waitForLoadState,waitForSelector) - Interact using selectors (
fill,click) - Verify success via DOM (preferred), or via scrape + known success text
- Release session
Failure handling inside CDP flow
- If a CDP operation throws, wait and retry once with longer timeouts.
- If the same selector fails twice, use one backup selector and retry once.
- Do not recreate the session after a single transient timeout.
Example (Playwright-style pseudo)
// connect to session websocketUrl
// const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`
// const browser = await chromium.connectOverCDP(wsUrl)
// const page = (await browser.contexts()[0].pages())[0] ?? await browser.newPage()
await page.goto("https://example.com");
await page.waitForLoadState("domcontentloaded");
await page.fill('input[name="email"]', "test@test.com");
await page.click('button[type="submit"]');
// verify success
await page.waitForSelector("text=Thanks for subscribing", { timeout: 10000 });
Prefer CDP-native solutions before falling back to Computer actions:
- JS dialogs: handle via dialog listeners
- File uploads:
setInputFiles(avoid OS file picker)- Permissions: grant at browser context level when possible
Step 2B (Fallback): Computer API (mouse/keyboard actions)
Use Computer actions when:
- CDP selectors fail repeatedly and you need a visual ânudgeâ
- Youâre blocked by a stubborn overlay/focus trap
- A browser-native-ish prompt is blocking progress
- You need quick recovery (Esc, click outside, scroll, etc.)
Endpoint: POST /v1/sessions/{id}/computer
Hard-learned schema rules (avoid validation errors)
- There is no
navigateaction. press_keyrequireskeysas an array (NOTkey)scrollusesdelta_y/delta_x(NOT direction/amount)
Action reference (safe subset)
take_screenshot:
{ "action": "take_screenshot" }
click_mouse:
{ "action": "click_mouse", "button": "left", "coordinates": [x,y], "screenshot": true }
type_text:
{ "action": "type_text", "text": "...", "screenshot": true }
press_key:
{ "action": "press_key", "keys": ["Enter"], "screenshot": true }
scroll:
{ "action": "scroll", "delta_y": 800, "coordinates": [x,y], "screenshot": true }
wait:
{ "action": "wait", "duration": 2000, "screenshot": true }
Computer-first recovery playbook (fast unstick)
take_screenshotpress_keyâ["Escape"]- click outside modal area
- scroll a bit (
delta_y) - screenshot again
- retry CDP approach once the blocker is gone
Anti-bot / blocker detection and response
- Cloudflare or anti-bot challenge wording appears (
Just a moment,Checking your browser, etc.): wait, capture screenshot, then one Computer recovery pass. - Repeated click interception or overlay coverage persists: screenshot,
press_key ["Escape"], click outside modal, scroll, screenshot. - Repeated wait-for-selector on same element: inspect blocker state first before changing selectors.
Navigating without CDP (fallback)
Since there is no navigate action, emulate it:
- Click address bar area (top center)
type_textURLpress_key["Enter"]wait+ screenshot
Step 3: CAPTCHA handling
Best default:
- set
solveCaptcha: truewhen creating a session
If stuck:
- use viewer URL for human-in-the-loop
- try computer recovery steps (scroll/hover/click checkbox) only if needed
Step 4: Extract session context (cookies/storage)
Endpoint: GET /v1/sessions/{id}/context
Use to:
- persist login state
- debug whether session stored cookies/localStorage
- export state for follow-up tasks
Note: if cookies/storage are empty, it may mean:
- you never actually logged in
- the page is blocked
- youâre in a different origin than expected
- session expired and you queried the wrong ID
Step 5: Release session (always)
Endpoint: POST /v1/sessions/{id}/release
Rule:
- Release as soon as youâve verified success or determined you canât proceed.
- If
releasereturnsSession not foundafter success verification, treat as completed.
Recipes
Recipe: Newsletter signup (CDP-first)
-
POST /v1/scrapeto find:- email input selector
- submit selector
- success message text (for verification)
-
Create session with long enough timeout:
{ "url": "https://site.com", "timeout": 600000 }
- Use CDP:
gotofillclick submit- verify success text/element
- Release session.
Recipe: Login flow (CDP-first)
- Create session with
timeout+ optionallysolveCaptcha - CDP:
goto(login)fill(username/password)click(sign in)- wait for logged-in selector
- Verify via DOM (profile avatar / logout button / dashboard URL)
- Optionally
GET /contextto confirm cookies exist - Release
Recipe: Stuck on an overlay (hybrid)
- CDP attempts fail due to overlay/click interception
- Use Computer API:
- screenshot
- press
Esc - click close âXâ
- scroll slightly
- screenshot
- Return to CDP and continue with selectors
- Verify + release
Troubleshooting (Error â Fix)
invalid_union / âNo matching discriminatorâ
Cause: unsupported action or wrong payload shape.
Fix:
- Use only the documented Computer actions
- Remove any
navigateaction usage
Invalid input: expected array ⦠path: keys
Cause: you used key instead of keys.
Fix:
{ "action": "press_key", "keys": ["Enter"] }
Scroll does nothing / âScrolled up by 0 at (0,0)â
Cause: using direction/amount or missing delta_y.
Fix:
{ "action": "scroll", "delta_y": 800, "coordinates": [960, 540] }
Session ... not found
Cause: session expired/released OR you used an old ID. Fix:
- Create a new session with a longer
timeout - Update the stored
SESSION_IDeverywhere - Donât mix multiple sessions unless necessary
- If this happens on
releaseafter successful verification, treat cleanup as already complete
connectOverCDP ... 502 Bad Gateway (to wss://connect.steel.dev/)
Cause: WS connection missing required auth in query string in this environment. Fix:
const wsUrl = `${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}`;
await chromium.connectOverCDP(wsUrl);
Curl errors like âblank argument where content is expectedâ
Cause: broken shell quoting / multiline JSON issues. Fix:
- Use one-line JSON with
--data-raw - Or build payload with
jq -nand pass it safely
SyntaxError / malformed page.evaluate script
Cause: mixed quoting or invalid JS embedded in shell/JSON. Fix:
- Keep JS scripts short and pass as raw heredocs or files.
- Validate escaping before embedding script text in one-liners.
- Fall back to one clean script per run instead of incremental inline patches.
Cannot find module 'playwright' or runtime import failures
Cause: missing playwright package in the execution environment.
Fix:
- Use one runtime per task and confirm module availability first.
- Install dependency before running or switch to a Python Playwright path consistently.
write_stdin failed: stdin is closed
Cause: writing to a terminated subprocess. Fix:
- Use session lifecycle to avoid interactive drift.
- Treat closed stdin as terminal for that branch; proceed with command-based rerun.
Best Practices (to prevent the exact failures from the logs)
CDP-first by default
- Use CDP for navigation + selectors + verification
- Only use Computer actions as an escape hatch
Always verify
For âsubmitâ tasks:
- Prefer DOM verification (CDP wait for success)
- Or re-scrape and look for success text / state change
- Donât claim success based on âclick happenedâ
Verification contract:
- Require one of the following before completion: expected URL change.
- Require one of the following before completion: visible success element.
- Require one of the following before completion: expected text or state change.
- If no post-condition is met, continue the retry ladder or return a blocker reason.
Bound your retries (avoid spirals)
Suggested retry ladder:
- CDP attempt (selectors + waits)
- CDP attempt (adjust selectors, wait longer)
- Computer recovery (Esc/click outside/scroll)
- One final CDP attempt If still blocked: stop and report whatâs blocking progress.
Standardized stop conditions:
- No more than 4 total retry loops per task.
- Session replacement only if expiration is confirmed (
Session not found). - At most one Computer recovery pass unless a new blocker category is observed.
Session hygiene
- Set
timeout - Reuse a single session per task whenever possible
- Release sessions
- Keep a single authoritative
SESSION_ID - Treat
release -> Session not foundas non-fatal if success was already verified
Secret hygiene
- Never request/paste keys
- Never echo keys in logs
- Prefer env vars
Summary
- Stateless endpoints for quick extraction/screenshots/PDFs.
- Sessions + CDP for reliable multi-step automation.
- Computer actions as a fallback to break through blockers or recover from stuck UI.
- Always verify outcomes and manage session lifecycles correctly.