skyvern

📁 skyvern-ai/skyvern 📅 2 days ago

总安装量

周安装量

#76267

全站排名

安装命令

npx skills add https://github.com/skyvern-ai/skyvern --skill skyvern

Agent 安装分布

amp 1

openclaw 1

opencode 1

cursor 1

kimi-cli 1

codex 1

Skill 文档

Skyvern Browser Automation — CLI Reference

Skyvern uses AI to navigate and interact with websites. This skill teaches the CLI commands. Every example is a runnable skyvern <command> invocation.

Setup

pip install skyvern
export SKYVERN_API_KEY="YOUR_KEY"   # get one at https://app.skyvern.com
skyvern init                        # optional -- configures local env

MCP upgrade — for richer AI-coding-tool integration (auto-tool-calling, prompts, etc.), run skyvern setup claude-code --project to register the Skyvern MCP server. MCP has its own instructions; this file covers CLI only.

Command Map

CLI Command	Purpose
`skyvern browser session create`	Start a cloud browser session
`skyvern browser session list`	List active sessions
`skyvern browser session get`	Get session details
`skyvern browser session connect`	Attach to existing session
`skyvern browser session close`	Close a session
`skyvern browser navigate`	Navigate to a URL
`skyvern browser screenshot`	Capture a screenshot
`skyvern browser act`	AI-driven multi-step action
`skyvern browser extract`	AI-powered data extraction
`skyvern browser validate`	Assert a condition on the page
`skyvern browser evaluate`	Run JavaScript on the page
`skyvern browser click`	Click an element
`skyvern browser type`	Type into an input
`skyvern browser hover`	Hover over an element
`skyvern browser scroll`	Scroll the page
`skyvern browser select`	Select a dropdown option
`skyvern browser press-key`	Press a keyboard key
`skyvern browser wait`	Wait for condition/time
`skyvern browser run-task`	One-off autonomous task
`skyvern browser login`	Log in with stored credentials
`skyvern workflow list`	List workflows
`skyvern workflow get`	Get workflow definition
`skyvern workflow create`	Create a workflow
`skyvern workflow update`	Update a workflow
`skyvern workflow delete`	Delete a workflow
`skyvern workflow run`	Execute a workflow
`skyvern workflow status`	Check run status
`skyvern workflow cancel`	Cancel a running workflow
`skyvern credential list`	List credentials (metadata)
`skyvern credential get`	Get credential metadata
`skyvern credential delete`	Delete a credential
`skyvern credentials add`	Create a credential (interactive)
`skyvern block schema`	Get block type schema
`skyvern block validate`	Validate a block definition

All commands accept --json for machine-readable output (e.g. skyvern browser session create --json).

Pattern 1: Session Lifecycle

Every browser automation follows: create -> navigate -> work -> close.

# 1. Create a cloud session (timeout in minutes, default 60)
skyvern browser session create --timeout 30

# 2. Navigate (uses the active session automatically)
skyvern browser navigate --url "https://example.com"

# 3. Do work (act, extract, click, etc.)
skyvern browser act --prompt "Click the Sign In button"

# 4. Verify with screenshot
skyvern browser screenshot

# 5. Close when done
skyvern browser session close

Session state persists between commands. After session create, subsequent commands auto-attach to the active session. Override with --session pbs_....

Session management

# List all sessions
skyvern browser session list

# Get details for a specific session
skyvern browser session get --session pbs_123

# Connect to an existing session (cloud or CDP)
skyvern browser session connect --session pbs_123
skyvern browser session connect --cdp "ws://localhost:9222"

# Close a specific session
skyvern browser session close --session pbs_123

Pattern 2: One-Off Task

Run an autonomous agent that navigates, acts, and extracts in a single call. Requires an active session (create one first).

# 1. Create a session
skyvern browser session create

# 2. Run the task (uses active session automatically)
skyvern browser run-task \
  --prompt "Go to the pricing page and extract all plan names and prices" \
  --url "https://example.com" \
  --schema '{"type":"object","properties":{"plans":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'

# 3. Close session when done
skyvern browser session close

Key flags:

--prompt (required): natural language task description
--url: starting URL (navigates before running the agent)
--schema (alias --data-extraction-schema): JSON schema for structured output
--max-steps: limit agent steps (default unlimited)
--timeout: seconds (default 180, max 1800)

Use run-task for quick tests. Use workflows for anything reusable.

Pattern 3: Data Extraction

# Navigate to the source page
skyvern browser navigate --url "https://example.com/products"

# Extract structured data with a JSON schema
skyvern browser extract \
  --prompt "Extract all product names and prices from the listing" \
  --schema '{"type":"object","properties":{"items":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}},"required":["name"]}}},"required":["items"]}'

Without --schema, extraction returns freeform data based on the prompt.

Schema design tips

Start with the smallest useful schema
Use "type":"string" for prices/dates unless format is guaranteed
Keep required to truly essential fields
Add provenance fields where needed (source_url, timestamp)

Pagination loop

# Page 1
skyvern browser extract --prompt "Extract all product rows"
# Check for next page
skyvern browser validate --prompt "Is there a Next page button that is not disabled?"
# If true, advance
skyvern browser act --prompt "Click the Next page button"
# Repeat extraction

Stop when: no next button, duplicate first row, or max page limit.

Pattern 4: Form Filling with Act

act performs AI-driven multi-step actions described in natural language:

skyvern browser act \
  --prompt "Fill the contact form: first name John, last name Doe, email john@example.com, then click Submit"

For precision control, use individual commands:

# Type into a field (by intent)
skyvern browser type --text "John" --intent "the first name input"

# Type into a field (by selector)
skyvern browser type --text "john@example.com" --selector "#email"

# Click a button (by intent)
skyvern browser click --intent "the Submit button"

# Select a dropdown option
skyvern browser select --value "US" --intent "the country dropdown"
skyvern browser select --value "California" --selector "#state" --by-label

# Press a key
skyvern browser press-key --key "Enter"

# Hover to reveal a menu
skyvern browser hover --intent "the Account menu"

Targeting modes

Precision commands (click, type, hover, select, scroll, press-key, wait) support three targeting modes:

Intent mode: --intent "the Submit button" (AI finds element)
Selector mode: --selector "#submit-btn" (CSS/XPath)
Hybrid mode: both --selector and --intent (selector narrows, AI confirms)

When unsure, use intent. For deterministic control, use selector.

Pattern 5: Auth with Login + Credentials

Credentials are created interactively (secrets never flow through CLI args):

# Create a credential (prompts for password securely via stdin)
skyvern credentials add --name "prod-salesforce" --type password --username "user@co.com"

Then use it in a browser session:

# List credentials to find the ID
skyvern credential list

# Create session and navigate to login page
skyvern browser session create
skyvern browser navigate --url "https://login.salesforce.com"

# Log in with stored credentials (AI handles the full login flow)
skyvern browser login --url "https://login.salesforce.com" --credential-id cred_123

# Verify login succeeded
skyvern browser validate --prompt "Is the user logged in? Look for a dashboard or user avatar."
skyvern browser screenshot

Credential types

# Password credential
skyvern credentials add --name "my-login" --type password --username "user"

# Credit card credential
skyvern credentials add --name "my-card" --type credit_card

# Secret credential (API key, token, etc.)
skyvern credentials add --name "my-secret" --type secret

Other credential providers: --credential-type bitwarden --bitwarden-item-id "...", --credential-type 1password --onepassword-vault-id "..." --onepassword-item-id "...", --credential-type azure_vault --azure-vault-name "..." --azure-vault-username-key "...".

Security rules

NEVER type passwords through skyvern browser type. Always use skyvern browser login.
Use skyvern credentials add to create credentials (interactive stdin input).
Reuse authenticated sessions for multi-step jobs on the same site.

Pattern 6: Workflows

Workflows are reusable, parameterized multi-step automations.

Create from file

# Create from a YAML or JSON file
skyvern workflow create --definition @workflow.yaml

# Create from inline JSON
skyvern workflow create --definition '{"title":"My Workflow","workflow_definition":{"parameters":[],"blocks":[{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click the pricing link"}]}}'

# Specify format explicitly
skyvern workflow create --definition @workflow.json --format json

Run a workflow

# Basic run
skyvern workflow run --id wpid_123

# With parameters (inline JSON or @file)
skyvern workflow run --id wpid_123 --params '{"email":"user@co.com","name":"John"}'
skyvern workflow run --id wpid_123 --params @params.json

# Wait for completion
skyvern workflow run --id wpid_123 --wait --timeout 600

# With proxy and webhook
skyvern workflow run --id wpid_123 --proxy RESIDENTIAL --webhook "https://hooks.example.com/done"

# Reuse an existing browser session
skyvern workflow run --id wpid_123 --session pbs_456

Monitor and manage

# Check run status
skyvern workflow status --run-id wr_789

# Cancel a run
skyvern workflow cancel --run-id wr_789

# List workflows (with search and pagination)
skyvern workflow list --search "invoice" --page 1 --page-size 20
skyvern workflow list --only-workflows  # exclude saved tasks

# Get workflow definition
skyvern workflow get --id wpid_123 --version 2

# Update a workflow
skyvern workflow update --id wpid_123 --definition @updated.yaml

# Delete a workflow
skyvern workflow delete --id wpid_123 --force

Run status lifecycle

created -> queued -> running -> completed | failed | canceled | terminated | timed_out

Block types

Use skyvern block schema to discover available types:

# List all block types
skyvern block schema

# Get schema for a specific type
skyvern block schema --type navigation

# Validate a block definition
skyvern block validate --block-json '{"block_type":"navigation","label":"step1","url":"https://example.com","navigation_goal":"Click pricing"}'
skyvern block validate --block-json @block.json

Core block types:

navigation — fill forms, click buttons, navigate flows (most common)
extraction — extract structured data from the current page
login — log into a site using stored credentials
for_loop — iterate over a list of items
conditional — branch based on conditions
code — run Python for data transformation
text_prompt — LLM generation (no browser)
action — single focused action
wait — pause for condition/time
goto_url — navigate directly to a URL
validation — assert page condition
http_request — call an external API
send_email — send notification
file_download / file_upload — file operations

Workflow design principles

One intent per block. Split multi-step goals into separate blocks.
Use {{parameter_key}} to reference workflow parameters.
Prefer navigation blocks for actions, extraction for data pulling.
All blocks in a workflow share the same browser session automatically.
Test feasibility interactively first (session + act + screenshot), then codify into a workflow.

Engine selection

Context	Engine	Notes
Known path — all fields and actions specified in prompt	`skyvern-1.0` (default)	Omit `engine` field
Dynamic planning — discover what to do at runtime	`skyvern-2.0`	Set `"engine": "skyvern-2.0"`

Long prompts with many fields are still 1.0. “Complexity” means dynamic planning, not field count. When in doubt, split into multiple 1.0 blocks.

Pattern 7: Debugging

Screenshot + validate loop

# Capture current state
skyvern browser screenshot
skyvern browser screenshot --full-page
skyvern browser screenshot --selector "#main-content" --output debug.png

# Check a condition
skyvern browser validate --prompt "Is the login form visible?"
skyvern browser validate --prompt "Does the page show an error message?"

# Run JavaScript to inspect state
skyvern browser evaluate --expression "document.title"
skyvern browser evaluate --expression "document.querySelectorAll('table tr').length"

Wait for conditions

# Wait for time
skyvern browser wait --time 3000

# Wait for a selector
skyvern browser wait --selector "#results-table" --state visible --timeout 10000

# Wait for an AI condition (polls until true)
skyvern browser wait --intent "The loading spinner has disappeared" --timeout 15000

# Scroll to find content
skyvern browser scroll --direction down --amount 500
skyvern browser scroll --direction down --intent "the pricing section"  # AI scroll-into-view

Common failure patterns

Action clicked wrong element: Fix: add stronger context in prompt. Use hybrid mode (selector + intent).

Extraction returns empty: Fix: wait for content-ready condition. Relax required fields. Validate visible row count before extracting.

Login passes but next step fails as logged out: Fix: ensure same session across steps. Add post-login validate check.

Stabilization moves

Replace brittle selectors with intent-based actions
Add explicit wait conditions before next action
Narrow extraction schema to required fields first
Split overloaded prompts into smaller goals

Writing Good Prompts

State the business outcome first, then constraints. Include explicit success criteria and keep one objective per invocation. Good: “Extract plan name and monthly price for each tier on the pricing page.” Bad: “Click around and get data.” Prefer natural language intents over brittle selectors.

See references/prompt-writing.md for templates and anti-patterns.

AI vs Precision: Decision Rules

Use AI actions (act, extract, validate) when:

Page labels are human-readable and stable
The goal is navigational or exploratory
You want resilience to minor layout changes

Use precision commands (click, type, select) when:

Element identity is deterministic and stable
AI action picked the wrong element
You need guaranteed exact input

Use hybrid mode (selector + intent together) when:

Pages are noisy or crowded
Selector narrows to a region, intent picks the exact element

Deep-Dive References

Reference	Content
`references/prompt-writing.md`	Prompt templates and anti-patterns
`references/engines.md`	When to use tasks vs workflows
`references/schemas.md`	JSON schema patterns for extraction
`references/pagination.md`	Pagination strategy and guardrails
`references/block-types.md`	Workflow block type details with examples
`references/parameters.md`	Parameter design and variable usage
`references/ai-actions.md`	AI action patterns and examples
`references/precision-actions.md`	Intent-only, selector-only, hybrid modes
`references/credentials.md`	Credential naming, lifecycle, safety
`references/sessions.md`	Session reuse and freshness decisions
`references/common-failures.md`	Failure pattern catalog with fixes
`references/screenshots.md`	Screenshot-led debugging workflow
`references/status-lifecycle.md`	Run status states and guidance
`references/rerun-playbook.md`	Rerun procedures and comparison
`references/complex-inputs.md`	Date pickers, uploads, dropdowns
`references/tool-map.md`	Complete tool inventory by outcome
`references/cli-parity.md`	CLI command to MCP tool mapping

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台