skyvern

📁 skyvern-ai/skyvern 📅 2 days ago
1
总安装量
1
周安装量
#76267
全站排名
安装命令
npx skills add https://github.com/skyvern-ai/skyvern --skill skyvern

Agent 安装分布

amp 1
openclaw 1
opencode 1
cursor 1
kimi-cli 1
codex 1

Skill 文档

Skyvern Browser Automation — CLI Reference

Skyvern uses AI to navigate and interact with websites. This skill teaches the CLI commands. Every example is a runnable skyvern <command> invocation.

Setup

pip install skyvern
export SKYVERN_API_KEY="YOUR_KEY"   # get one at https://app.skyvern.com
skyvern init                        # optional -- configures local env

MCP upgrade — for richer AI-coding-tool integration (auto-tool-calling, prompts, etc.), run skyvern setup claude-code --project to register the Skyvern MCP server. MCP has its own instructions; this file covers CLI only.


Command Map

CLI Command Purpose
skyvern browser session create Start a cloud browser session
skyvern browser session list List active sessions
skyvern browser session get Get session details
skyvern browser session connect Attach to existing session
skyvern browser session close Close a session
skyvern browser navigate Navigate to a URL
skyvern browser screenshot Capture a screenshot
skyvern browser act AI-driven multi-step action
skyvern browser extract AI-powered data extraction
skyvern browser validate Assert a condition on the page
skyvern browser evaluate Run JavaScript on the page
skyvern browser click Click an element
skyvern browser type Type into an input
skyvern browser hover Hover over an element
skyvern browser scroll Scroll the page
skyvern browser select Select a dropdown option
skyvern browser press-key Press a keyboard key
skyvern browser wait Wait for condition/time
skyvern browser run-task One-off autonomous task
skyvern browser login Log in with stored credentials
skyvern workflow list List workflows
skyvern workflow get Get workflow definition
skyvern workflow create Create a workflow
skyvern workflow update Update a workflow
skyvern workflow delete Delete a workflow
skyvern workflow run Execute a workflow
skyvern workflow status Check run status
skyvern workflow cancel Cancel a running workflow
skyvern credential list List credentials (metadata)
skyvern credential get Get credential metadata
skyvern credential delete Delete a credential
skyvern credentials add Create a credential (interactive)
skyvern block schema Get block type schema
skyvern block validate Validate a block definition

All commands accept --json for machine-readable output (e.g. skyvern browser session create --json).


Pattern 1: Session Lifecycle

Every browser automation follows: create -> navigate -> work -> close.

# 1. Create a cloud session (timeout in minutes, default 60)
skyvern browser session create --timeout 30

# 2. Navigate (uses the active session automatically)
skyvern browser navigate --url "https://example.com"

# 3. Do work (act, extract, click, etc.)
skyvern browser act --prompt "Click the Sign In button"

# 4. Verify with screenshot
skyvern browser screenshot

# 5. Close when done
skyvern browser session close

Session state persists between commands. After session create, subsequent commands auto-attach to the active session. Override with --session pbs_....

Session management

# List all sessions
skyvern browser session list

# Get details for a specific session
skyvern browser session get --session pbs_123

# Connect to an existing session (cloud or CDP)
skyvern browser session connect --session pbs_123
skyvern browser session connect --cdp "ws://localhost:9222"

# Close a specific session
skyvern browser session close --session pbs_123

Pattern 2: One-Off Task

Run an autonomous agent that navigates, acts, and extracts in a single call. Requires an active session (create one first).

# 1. Create a session
skyvern browser session create

# 2. Run the task (uses active session automatically)
skyvern browser run-task \
  --prompt "Go to the pricing page and extract all plan names and prices" \
  --url "https://example.com" \
  --schema '{"type":"object","properties":{"plans":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'

# 3. Close session when done
skyvern browser session close

Key flags:

  • --prompt (required): natural language task description
  • --url: starting URL (navigates before running the agent)
  • --schema (alias --data-extraction-schema): JSON schema for structured output
  • --max-steps: limit agent steps (default unlimited)
  • --timeout: seconds (default 180, max 1800)

Use run-task for quick tests. Use workflows for anything reusable.


Pattern 3: Data Extraction

# Navigate to the source page
skyvern browser navigate --url "https://example.com/products"

# Extract structured data with a JSON schema
skyvern browser extract \
  --prompt "Extract all product names and prices from the listing" \
  --schema '{"type":"object","properties":{"items":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}},"required":["name"]}}},"required":["items"]}'

Without --schema, extraction returns freeform data based on the prompt.

Schema design tips

  • Start with the smallest useful schema
  • Use "type":"string" for prices/dates unless format is guaranteed
  • Keep required to truly essential fields
  • Add provenance fields where needed (source_url, timestamp)

Pagination loop

# Page 1
skyvern browser extract --prompt "Extract all product rows"
# Check for next page
skyvern browser validate --prompt "Is there a Next page button that is not disabled?"
# If true, advance
skyvern browser act --prompt "Click the Next page button"
# Repeat extraction

Stop when: no next button, duplicate first row, or max page limit.


Pattern 4: Form Filling with Act

act performs AI-driven multi-step actions described in natural language:

skyvern browser act \
  --prompt "Fill the contact form: first name John, last name Doe, email john@example.com, then click Submit"

For precision control, use individual commands:

# Type into a field (by intent)
skyvern browser type --text "John" --intent "the first name input"

# Type into a field (by selector)
skyvern browser type --text "john@example.com" --selector "#email"

# Click a button (by intent)
skyvern browser click --intent "the Submit button"

# Select a dropdown option
skyvern browser select --value "US" --intent "the country dropdown"
skyvern browser select --value "California" --selector "#state" --by-label

# Press a key
skyvern browser press-key --key "Enter"

# Hover to reveal a menu
skyvern browser hover --intent "the Account menu"

Targeting modes

Precision commands (click, type, hover, select, scroll, press-key, wait) support three targeting modes:

  1. Intent mode: --intent "the Submit button" (AI finds element)
  2. Selector mode: --selector "#submit-btn" (CSS/XPath)
  3. Hybrid mode: both --selector and --intent (selector narrows, AI confirms)

When unsure, use intent. For deterministic control, use selector.


Pattern 5: Auth with Login + Credentials

Credentials are created interactively (secrets never flow through CLI args):

# Create a credential (prompts for password securely via stdin)
skyvern credentials add --name "prod-salesforce" --type password --username "user@co.com"

Then use it in a browser session:

# List credentials to find the ID
skyvern credential list

# Create session and navigate to login page
skyvern browser session create
skyvern browser navigate --url "https://login.salesforce.com"

# Log in with stored credentials (AI handles the full login flow)
skyvern browser login --url "https://login.salesforce.com" --credential-id cred_123

# Verify login succeeded
skyvern browser validate --prompt "Is the user logged in? Look for a dashboard or user avatar."
skyvern browser screenshot

Credential types

# Password credential
skyvern credentials add --name "my-login" --type password --username "user"

# Credit card credential
skyvern credentials add --name "my-card" --type credit_card

# Secret credential (API key, token, etc.)
skyvern credentials add --name "my-secret" --type secret

Other credential providers: --credential-type bitwarden --bitwarden-item-id "...", --credential-type 1password --onepassword-vault-id "..." --onepassword-item-id "...", --credential-type azure_vault --azure-vault-name "..." --azure-vault-username-key "...".

Security rules

  • NEVER type passwords through skyvern browser type. Always use skyvern browser login.
  • Use skyvern credentials add to create credentials (interactive stdin input).
  • Reuse authenticated sessions for multi-step jobs on the same site.

Pattern 6: Workflows

Workflows are reusable, parameterized multi-step automations.

Create from file

# Create from a YAML or JSON file
skyvern workflow create --definition @workflow.yaml

# Create from inline JSON
skyvern workflow create --definition '{"title":"My Workflow","workflow_definition":{"parameters":[],"blocks":[{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click the pricing link"}]}}'

# Specify format explicitly
skyvern workflow create --definition @workflow.json --format json

Run a workflow

# Basic run
skyvern workflow run --id wpid_123

# With parameters (inline JSON or @file)
skyvern workflow run --id wpid_123 --params '{"email":"user@co.com","name":"John"}'
skyvern workflow run --id wpid_123 --params @params.json

# Wait for completion
skyvern workflow run --id wpid_123 --wait --timeout 600

# With proxy and webhook
skyvern workflow run --id wpid_123 --proxy RESIDENTIAL --webhook "https://hooks.example.com/done"

# Reuse an existing browser session
skyvern workflow run --id wpid_123 --session pbs_456

Monitor and manage

# Check run status
skyvern workflow status --run-id wr_789

# Cancel a run
skyvern workflow cancel --run-id wr_789

# List workflows (with search and pagination)
skyvern workflow list --search "invoice" --page 1 --page-size 20
skyvern workflow list --only-workflows  # exclude saved tasks

# Get workflow definition
skyvern workflow get --id wpid_123 --version 2

# Update a workflow
skyvern workflow update --id wpid_123 --definition @updated.yaml

# Delete a workflow
skyvern workflow delete --id wpid_123 --force

Run status lifecycle

created -> queued -> running -> completed | failed | canceled | terminated | timed_out

Block types

Use skyvern block schema to discover available types:

# List all block types
skyvern block schema

# Get schema for a specific type
skyvern block schema --type navigation

# Validate a block definition
skyvern block validate --block-json '{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click pricing"}'
skyvern block validate --block-json @block.json

Core block types:

  • navigation — fill forms, click buttons, navigate flows (most common)
  • extraction — extract structured data from the current page
  • login — log into a site using stored credentials
  • for_loop — iterate over a list of items
  • conditional — branch based on conditions
  • code — run Python for data transformation
  • text_prompt — LLM generation (no browser)
  • action — single focused action
  • wait — pause for condition/time
  • goto_url — navigate directly to a URL
  • validation — assert page condition
  • http_request — call an external API
  • send_email — send notification
  • file_download / file_upload — file operations

Workflow design principles

  • One intent per block. Split multi-step goals into separate blocks.
  • Use {{parameter_key}} to reference workflow parameters.
  • Prefer navigation blocks for actions, extraction for data pulling.
  • All blocks in a workflow share the same browser session automatically.
  • Test feasibility interactively first (session + act + screenshot), then codify into a workflow.

Engine selection

Context Engine Notes
Known path — all fields and actions specified in prompt skyvern-1.0 (default) Omit engine field
Dynamic planning — discover what to do at runtime skyvern-2.0 Set "engine": "skyvern-2.0"

Long prompts with many fields are still 1.0. “Complexity” means dynamic planning, not field count. When in doubt, split into multiple 1.0 blocks.


Pattern 7: Debugging

Screenshot + validate loop

# Capture current state
skyvern browser screenshot
skyvern browser screenshot --full-page
skyvern browser screenshot --selector "#main-content" --output debug.png

# Check a condition
skyvern browser validate --prompt "Is the login form visible?"
skyvern browser validate --prompt "Does the page show an error message?"

# Run JavaScript to inspect state
skyvern browser evaluate --expression "document.title"
skyvern browser evaluate --expression "document.querySelectorAll('table tr').length"

Wait for conditions

# Wait for time
skyvern browser wait --time 3000

# Wait for a selector
skyvern browser wait --selector "#results-table" --state visible --timeout 10000

# Wait for an AI condition (polls until true)
skyvern browser wait --intent "The loading spinner has disappeared" --timeout 15000

# Scroll to find content
skyvern browser scroll --direction down --amount 500
skyvern browser scroll --direction down --intent "the pricing section"  # AI scroll-into-view

Common failure patterns

Action clicked wrong element: Fix: add stronger context in prompt. Use hybrid mode (selector + intent).

Extraction returns empty: Fix: wait for content-ready condition. Relax required fields. Validate visible row count before extracting.

Login passes but next step fails as logged out: Fix: ensure same session across steps. Add post-login validate check.

Stabilization moves

  • Replace brittle selectors with intent-based actions
  • Add explicit wait conditions before next action
  • Narrow extraction schema to required fields first
  • Split overloaded prompts into smaller goals

Writing Good Prompts

State the business outcome first, then constraints. Include explicit success criteria and keep one objective per invocation. Good: “Extract plan name and monthly price for each tier on the pricing page.” Bad: “Click around and get data.” Prefer natural language intents over brittle selectors.

See references/prompt-writing.md for templates and anti-patterns.


AI vs Precision: Decision Rules

Use AI actions (act, extract, validate) when:

  • Page labels are human-readable and stable
  • The goal is navigational or exploratory
  • You want resilience to minor layout changes

Use precision commands (click, type, select) when:

  • Element identity is deterministic and stable
  • AI action picked the wrong element
  • You need guaranteed exact input

Use hybrid mode (selector + intent together) when:

  • Pages are noisy or crowded
  • Selector narrows to a region, intent picks the exact element

Deep-Dive References

Reference Content
references/prompt-writing.md Prompt templates and anti-patterns
references/engines.md When to use tasks vs workflows
references/schemas.md JSON schema patterns for extraction
references/pagination.md Pagination strategy and guardrails
references/block-types.md Workflow block type details with examples
references/parameters.md Parameter design and variable usage
references/ai-actions.md AI action patterns and examples
references/precision-actions.md Intent-only, selector-only, hybrid modes
references/credentials.md Credential naming, lifecycle, safety
references/sessions.md Session reuse and freshness decisions
references/common-failures.md Failure pattern catalog with fixes
references/screenshots.md Screenshot-led debugging workflow
references/status-lifecycle.md Run status states and guidance
references/rerun-playbook.md Rerun procedures and comparison
references/complex-inputs.md Date pickers, uploads, dropdowns
references/tool-map.md Complete tool inventory by outcome
references/cli-parity.md CLI command to MCP tool mapping