skyvern
npx skills add https://github.com/skyvern-ai/skyvern --skill skyvern
Agent 安装分布
Skill 文档
Skyvern Browser Automation — CLI Reference
Skyvern uses AI to navigate and interact with websites. This skill teaches the
CLI commands. Every example is a runnable skyvern <command> invocation.
Setup
pip install skyvern
export SKYVERN_API_KEY="YOUR_KEY" # get one at https://app.skyvern.com
skyvern init # optional -- configures local env
MCP upgrade — for richer AI-coding-tool integration (auto-tool-calling,
prompts, etc.), run skyvern setup claude-code --project to register the
Skyvern MCP server. MCP has its own instructions; this file covers CLI only.
Command Map
| CLI Command | Purpose |
|---|---|
skyvern browser session create |
Start a cloud browser session |
skyvern browser session list |
List active sessions |
skyvern browser session get |
Get session details |
skyvern browser session connect |
Attach to existing session |
skyvern browser session close |
Close a session |
skyvern browser navigate |
Navigate to a URL |
skyvern browser screenshot |
Capture a screenshot |
skyvern browser act |
AI-driven multi-step action |
skyvern browser extract |
AI-powered data extraction |
skyvern browser validate |
Assert a condition on the page |
skyvern browser evaluate |
Run JavaScript on the page |
skyvern browser click |
Click an element |
skyvern browser type |
Type into an input |
skyvern browser hover |
Hover over an element |
skyvern browser scroll |
Scroll the page |
skyvern browser select |
Select a dropdown option |
skyvern browser press-key |
Press a keyboard key |
skyvern browser wait |
Wait for condition/time |
skyvern browser run-task |
One-off autonomous task |
skyvern browser login |
Log in with stored credentials |
skyvern workflow list |
List workflows |
skyvern workflow get |
Get workflow definition |
skyvern workflow create |
Create a workflow |
skyvern workflow update |
Update a workflow |
skyvern workflow delete |
Delete a workflow |
skyvern workflow run |
Execute a workflow |
skyvern workflow status |
Check run status |
skyvern workflow cancel |
Cancel a running workflow |
skyvern credential list |
List credentials (metadata) |
skyvern credential get |
Get credential metadata |
skyvern credential delete |
Delete a credential |
skyvern credentials add |
Create a credential (interactive) |
skyvern block schema |
Get block type schema |
skyvern block validate |
Validate a block definition |
All commands accept --json for machine-readable output (e.g. skyvern browser session create --json).
Pattern 1: Session Lifecycle
Every browser automation follows: create -> navigate -> work -> close.
# 1. Create a cloud session (timeout in minutes, default 60)
skyvern browser session create --timeout 30
# 2. Navigate (uses the active session automatically)
skyvern browser navigate --url "https://example.com"
# 3. Do work (act, extract, click, etc.)
skyvern browser act --prompt "Click the Sign In button"
# 4. Verify with screenshot
skyvern browser screenshot
# 5. Close when done
skyvern browser session close
Session state persists between commands. After session create, subsequent
commands auto-attach to the active session. Override with --session pbs_....
Session management
# List all sessions
skyvern browser session list
# Get details for a specific session
skyvern browser session get --session pbs_123
# Connect to an existing session (cloud or CDP)
skyvern browser session connect --session pbs_123
skyvern browser session connect --cdp "ws://localhost:9222"
# Close a specific session
skyvern browser session close --session pbs_123
Pattern 2: One-Off Task
Run an autonomous agent that navigates, acts, and extracts in a single call. Requires an active session (create one first).
# 1. Create a session
skyvern browser session create
# 2. Run the task (uses active session automatically)
skyvern browser run-task \
--prompt "Go to the pricing page and extract all plan names and prices" \
--url "https://example.com" \
--schema '{"type":"object","properties":{"plans":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'
# 3. Close session when done
skyvern browser session close
Key flags:
--prompt(required): natural language task description--url: starting URL (navigates before running the agent)--schema(alias--data-extraction-schema): JSON schema for structured output--max-steps: limit agent steps (default unlimited)--timeout: seconds (default 180, max 1800)
Use run-task for quick tests. Use workflows for anything reusable.
Pattern 3: Data Extraction
# Navigate to the source page
skyvern browser navigate --url "https://example.com/products"
# Extract structured data with a JSON schema
skyvern browser extract \
--prompt "Extract all product names and prices from the listing" \
--schema '{"type":"object","properties":{"items":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}},"required":["name"]}}},"required":["items"]}'
Without --schema, extraction returns freeform data based on the prompt.
Schema design tips
- Start with the smallest useful schema
- Use
"type":"string"for prices/dates unless format is guaranteed - Keep
requiredto truly essential fields - Add provenance fields where needed (
source_url, timestamp)
Pagination loop
# Page 1
skyvern browser extract --prompt "Extract all product rows"
# Check for next page
skyvern browser validate --prompt "Is there a Next page button that is not disabled?"
# If true, advance
skyvern browser act --prompt "Click the Next page button"
# Repeat extraction
Stop when: no next button, duplicate first row, or max page limit.
Pattern 4: Form Filling with Act
act performs AI-driven multi-step actions described in natural language:
skyvern browser act \
--prompt "Fill the contact form: first name John, last name Doe, email john@example.com, then click Submit"
For precision control, use individual commands:
# Type into a field (by intent)
skyvern browser type --text "John" --intent "the first name input"
# Type into a field (by selector)
skyvern browser type --text "john@example.com" --selector "#email"
# Click a button (by intent)
skyvern browser click --intent "the Submit button"
# Select a dropdown option
skyvern browser select --value "US" --intent "the country dropdown"
skyvern browser select --value "California" --selector "#state" --by-label
# Press a key
skyvern browser press-key --key "Enter"
# Hover to reveal a menu
skyvern browser hover --intent "the Account menu"
Targeting modes
Precision commands (click, type, hover, select, scroll, press-key,
wait) support three targeting modes:
- Intent mode:
--intent "the Submit button"(AI finds element) - Selector mode:
--selector "#submit-btn"(CSS/XPath) - Hybrid mode: both
--selectorand--intent(selector narrows, AI confirms)
When unsure, use intent. For deterministic control, use selector.
Pattern 5: Auth with Login + Credentials
Credentials are created interactively (secrets never flow through CLI args):
# Create a credential (prompts for password securely via stdin)
skyvern credentials add --name "prod-salesforce" --type password --username "user@co.com"
Then use it in a browser session:
# List credentials to find the ID
skyvern credential list
# Create session and navigate to login page
skyvern browser session create
skyvern browser navigate --url "https://login.salesforce.com"
# Log in with stored credentials (AI handles the full login flow)
skyvern browser login --url "https://login.salesforce.com" --credential-id cred_123
# Verify login succeeded
skyvern browser validate --prompt "Is the user logged in? Look for a dashboard or user avatar."
skyvern browser screenshot
Credential types
# Password credential
skyvern credentials add --name "my-login" --type password --username "user"
# Credit card credential
skyvern credentials add --name "my-card" --type credit_card
# Secret credential (API key, token, etc.)
skyvern credentials add --name "my-secret" --type secret
Other credential providers: --credential-type bitwarden --bitwarden-item-id "...",
--credential-type 1password --onepassword-vault-id "..." --onepassword-item-id "...",
--credential-type azure_vault --azure-vault-name "..." --azure-vault-username-key "...".
Security rules
- NEVER type passwords through
skyvern browser type. Always useskyvern browser login. - Use
skyvern credentials addto create credentials (interactive stdin input). - Reuse authenticated sessions for multi-step jobs on the same site.
Pattern 6: Workflows
Workflows are reusable, parameterized multi-step automations.
Create from file
# Create from a YAML or JSON file
skyvern workflow create --definition @workflow.yaml
# Create from inline JSON
skyvern workflow create --definition '{"title":"My Workflow","workflow_definition":{"parameters":[],"blocks":[{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click the pricing link"}]}}'
# Specify format explicitly
skyvern workflow create --definition @workflow.json --format json
Run a workflow
# Basic run
skyvern workflow run --id wpid_123
# With parameters (inline JSON or @file)
skyvern workflow run --id wpid_123 --params '{"email":"user@co.com","name":"John"}'
skyvern workflow run --id wpid_123 --params @params.json
# Wait for completion
skyvern workflow run --id wpid_123 --wait --timeout 600
# With proxy and webhook
skyvern workflow run --id wpid_123 --proxy RESIDENTIAL --webhook "https://hooks.example.com/done"
# Reuse an existing browser session
skyvern workflow run --id wpid_123 --session pbs_456
Monitor and manage
# Check run status
skyvern workflow status --run-id wr_789
# Cancel a run
skyvern workflow cancel --run-id wr_789
# List workflows (with search and pagination)
skyvern workflow list --search "invoice" --page 1 --page-size 20
skyvern workflow list --only-workflows # exclude saved tasks
# Get workflow definition
skyvern workflow get --id wpid_123 --version 2
# Update a workflow
skyvern workflow update --id wpid_123 --definition @updated.yaml
# Delete a workflow
skyvern workflow delete --id wpid_123 --force
Run status lifecycle
created -> queued -> running -> completed | failed | canceled | terminated | timed_out
Block types
Use skyvern block schema to discover available types:
# List all block types
skyvern block schema
# Get schema for a specific type
skyvern block schema --type navigation
# Validate a block definition
skyvern block validate --block-json '{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click pricing"}'
skyvern block validate --block-json @block.json
Core block types:
- navigation — fill forms, click buttons, navigate flows (most common)
- extraction — extract structured data from the current page
- login — log into a site using stored credentials
- for_loop — iterate over a list of items
- conditional — branch based on conditions
- code — run Python for data transformation
- text_prompt — LLM generation (no browser)
- action — single focused action
- wait — pause for condition/time
- goto_url — navigate directly to a URL
- validation — assert page condition
- http_request — call an external API
- send_email — send notification
- file_download / file_upload — file operations
Workflow design principles
- One intent per block. Split multi-step goals into separate blocks.
- Use
{{parameter_key}}to reference workflow parameters. - Prefer
navigationblocks for actions,extractionfor data pulling. - All blocks in a workflow share the same browser session automatically.
- Test feasibility interactively first (session + act + screenshot), then codify into a workflow.
Engine selection
| Context | Engine | Notes |
|---|---|---|
| Known path — all fields and actions specified in prompt | skyvern-1.0 (default) |
Omit engine field |
| Dynamic planning — discover what to do at runtime | skyvern-2.0 |
Set "engine": "skyvern-2.0" |
Long prompts with many fields are still 1.0. “Complexity” means dynamic planning, not field count. When in doubt, split into multiple 1.0 blocks.
Pattern 7: Debugging
Screenshot + validate loop
# Capture current state
skyvern browser screenshot
skyvern browser screenshot --full-page
skyvern browser screenshot --selector "#main-content" --output debug.png
# Check a condition
skyvern browser validate --prompt "Is the login form visible?"
skyvern browser validate --prompt "Does the page show an error message?"
# Run JavaScript to inspect state
skyvern browser evaluate --expression "document.title"
skyvern browser evaluate --expression "document.querySelectorAll('table tr').length"
Wait for conditions
# Wait for time
skyvern browser wait --time 3000
# Wait for a selector
skyvern browser wait --selector "#results-table" --state visible --timeout 10000
# Wait for an AI condition (polls until true)
skyvern browser wait --intent "The loading spinner has disappeared" --timeout 15000
# Scroll to find content
skyvern browser scroll --direction down --amount 500
skyvern browser scroll --direction down --intent "the pricing section" # AI scroll-into-view
Common failure patterns
Action clicked wrong element: Fix: add stronger context in prompt. Use hybrid mode (selector + intent).
Extraction returns empty: Fix: wait for content-ready condition. Relax required fields. Validate visible row count before extracting.
Login passes but next step fails as logged out:
Fix: ensure same session across steps. Add post-login validate check.
Stabilization moves
- Replace brittle selectors with intent-based actions
- Add explicit wait conditions before next action
- Narrow extraction schema to required fields first
- Split overloaded prompts into smaller goals
Writing Good Prompts
State the business outcome first, then constraints. Include explicit success criteria and keep one objective per invocation. Good: “Extract plan name and monthly price for each tier on the pricing page.” Bad: “Click around and get data.” Prefer natural language intents over brittle selectors.
See references/prompt-writing.md for templates and anti-patterns.
AI vs Precision: Decision Rules
Use AI actions (act, extract, validate) when:
- Page labels are human-readable and stable
- The goal is navigational or exploratory
- You want resilience to minor layout changes
Use precision commands (click, type, select) when:
- Element identity is deterministic and stable
- AI action picked the wrong element
- You need guaranteed exact input
Use hybrid mode (selector + intent together) when:
- Pages are noisy or crowded
- Selector narrows to a region, intent picks the exact element
Deep-Dive References
| Reference | Content |
|---|---|
references/prompt-writing.md |
Prompt templates and anti-patterns |
references/engines.md |
When to use tasks vs workflows |
references/schemas.md |
JSON schema patterns for extraction |
references/pagination.md |
Pagination strategy and guardrails |
references/block-types.md |
Workflow block type details with examples |
references/parameters.md |
Parameter design and variable usage |
references/ai-actions.md |
AI action patterns and examples |
references/precision-actions.md |
Intent-only, selector-only, hybrid modes |
references/credentials.md |
Credential naming, lifecycle, safety |
references/sessions.md |
Session reuse and freshness decisions |
references/common-failures.md |
Failure pattern catalog with fixes |
references/screenshots.md |
Screenshot-led debugging workflow |
references/status-lifecycle.md |
Run status states and guidance |
references/rerun-playbook.md |
Rerun procedures and comparison |
references/complex-inputs.md |
Date pickers, uploads, dropdowns |
references/tool-map.md |
Complete tool inventory by outcome |
references/cli-parity.md |
CLI command to MCP tool mapping |