browser

📁 gmickel/gmickel-claude-marketplace 📅 Jan 25, 2026
8
总安装量
8
周安装量
#36052
全站排名
安装命令
npx skills add https://github.com/gmickel/gmickel-claude-marketplace --skill browser

Agent 安装分布

opencode 7
antigravity 7
claude-code 7
codex 7
gemini-cli 7
cursor 7

Skill 文档

Browser Automation

Browser automation via Vercel’s agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Setup & Version Check

# Check installed + print version
command -v agent-browser >/dev/null 2>&1 && agent-browser --version || echo "MISSING: npm i -g agent-browser && agent-browser install"

Always run the version check at the start of a browser session. agent-browser iterates quickly — check for updates if the version is more than a week old:

npm view agent-browser version  # Latest published

Core Workflow

  1. Open URL
  2. Snapshot to get refs
  3. Interact via refs
  4. Re-snapshot after DOM changes
agent-browser open https://example.com
agent-browser snapshot -i              # Interactive elements with refs
agent-browser click @e1
agent-browser wait --load networkidle  # Wait for SPA to settle
agent-browser snapshot -i              # Re-snapshot after change

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

# Chain open + wait + snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

When to chain: Use && when you don’t need intermediate output (e.g., open + wait + screenshot). Run separately when you need to parse output first (e.g., snapshot to discover refs, then interact).

Essential Commands

Navigation

agent-browser open <url>       # Navigate (aliases: goto, navigate)
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload
agent-browser close            # Close browser (aliases: quit, exit)

Snapshots

agent-browser snapshot           # Full accessibility tree
agent-browser snapshot -i        # Interactive only (recommended)
agent-browser snapshot -i -C     # Include cursor-interactive (divs with onclick, cursor:pointer)
agent-browser snapshot -i --json # JSON for parsing
agent-browser snapshot -c        # Compact (remove empty)
agent-browser snapshot -d 3      # Limit depth
agent-browser snapshot -s "#main" # Scope to selector

Interactions

agent-browser click @e1              # Click
agent-browser click @e1 --new-tab    # Click and open in new tab
agent-browser dblclick @e1           # Double-click
agent-browser fill @e1 "text"        # Clear + fill input
agent-browser type @e1 "text"        # Type without clearing
agent-browser press Enter            # Key press
agent-browser press Control+a        # Key combination
agent-browser keydown Shift          # Hold key down
agent-browser keyup Shift            # Release key
agent-browser hover @e1              # Hover
agent-browser check @e1              # Check checkbox
agent-browser uncheck @e1            # Uncheck
agent-browser select @e1 "option"    # Dropdown
agent-browser select @e1 "a" "b"     # Multi-select
agent-browser scroll down 500        # Scroll direction + pixels
agent-browser scrollintoview @e1     # Scroll element visible
agent-browser drag @e1 @e2           # Drag and drop
agent-browser upload @e1 file.pdf    # Upload files

Get Info

agent-browser get text @e1       # Element text
agent-browser get value @e1      # Input value
agent-browser get html @e1       # Element HTML
agent-browser get attr href @e1  # Attribute
agent-browser get title          # Page title
agent-browser get url            # Current URL
agent-browser get count "button" # Count matches
agent-browser get box @e1        # Bounding box (x, y, width, height)
agent-browser get styles @e1     # Computed styles (font, color, bg)

Check State

agent-browser is visible @e1    # Check visibility
agent-browser is enabled @e1    # Check enabled
agent-browser is checked @e1    # Check checkbox state

Wait

agent-browser wait @e1                 # Wait for element visible
agent-browser wait 2000                # Wait milliseconds
agent-browser wait --text "Success"    # Wait for text (-t)
agent-browser wait --url "**/dashboard" # Wait for URL pattern (-u)
agent-browser wait --load networkidle  # Wait for network idle (-l)
agent-browser wait --fn "window.ready" # Wait for JS condition (-f)

Screenshots & Capture

agent-browser screenshot              # Viewport to temp dir
agent-browser screenshot out.png      # Save to file
agent-browser screenshot --full       # Full page
agent-browser screenshot --annotate   # Annotated with numbered element labels
agent-browser pdf out.pdf             # Save as PDF

Diff (Compare Page States)

Compare accessibility tree or visual state before/after changes:

# Snapshot diff: compare current vs last snapshot
agent-browser snapshot -i              # Baseline
agent-browser click @e2                # Action
agent-browser diff snapshot            # See what changed

# Snapshot diff: compare vs saved file
agent-browser diff snapshot --baseline before.txt

# Visual pixel diff
agent-browser diff screenshot --baseline before.png

# Compare two URLs
agent-browser diff url https://staging.example.com https://prod.example.com
agent-browser diff url <url1> <url2> --wait-until networkidle
agent-browser diff url <url1> <url2> --selector "#main"
agent-browser diff url <url1> <url2> --screenshot  # Visual diff

diff snapshot uses +/- like git diff. diff screenshot produces a diff image with changed pixels in red + mismatch percentage.

Semantic Locators

Alternative when you know the element (no snapshot needed):

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact  # Exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" hover

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. Also caches refs — interact immediately without separate snapshot.

agent-browser screenshot --annotate
# Output includes image path + legend:
#   [1] @e1 button "Submit"
#   [2] @e2 link "Home"
#   [3] @e3 textbox "Email"
agent-browser click @e2  # Use ref from annotated screenshot

Use when: unlabeled icon buttons, visual-only elements, canvas/charts (invisible to text snapshots), or spatial reasoning needed.

JavaScript Evaluation

Use eval to run JS in the browser. Shell quoting can corrupt complex expressions — use --stdin or -b to avoid issues.

# Simple expressions: regular quoting OK
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'

# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
  Array.from(document.querySelectorAll("img"))
    .filter(i => !i.alt)
    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF

# Alternative: base64 encoding (bypasses all shell escaping)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"

Rules of thumb:

  • Single-line, no nested quotes → eval 'expression' with single quotes
  • Nested quotes, arrow functions, template literals, multiline → eval --stdin <<'EVALEOF'
  • Programmatic/generated scripts → eval -b with base64

Sessions

Parallel isolated browsers (see auth.md for multi-user auth):

agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list

Session Persistence

Auto-save/restore cookies and localStorage across browser restarts:

agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close  # State auto-saved to ~/.agent-browser/sessions/

# Next time: state auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard

# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com

# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7

Connect to Existing Chrome

# Auto-discover running Chrome with remote debugging
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot

# Or explicit CDP port
agent-browser --cdp 9222 snapshot

Local Files

agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow: snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile gesture
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close

Requires: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest). Real devices: Use --device "<UDID>" (UDID from xcrun xctrace list devices).

Configuration File

Create agent-browser.json in project root for persistent settings:

{
  "headed": true,
  "proxy": "http://localhost:8080",
  "profile": "./browser-data"
}

Priority (lowest→highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG for custom path. All CLI options map to camelCase keys (--executable-path → "executablePath").

Timeouts and Slow Pages

Default Playwright timeout is 60s. For slow pages, use explicit waits:

agent-browser wait --load networkidle      # Best for slow pages
agent-browser wait "#content"              # Wait for specific element
agent-browser wait @e1                     # Wait for ref
agent-browser wait --url "**/dashboard"    # Wait after redirects
agent-browser wait --fn "document.readyState === 'complete'"
agent-browser wait 5000                    # Fixed duration (last resort)

Use wait --load networkidle after open for consistently slow sites.

JSON Output

Add --json for machine-readable output:

agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser is visible @e1 --json

Recording & Profiling

# Video recording
agent-browser record start demo.webm
# ... actions ...
agent-browser record stop
agent-browser record restart take2.webm  # Stop current + start new

# Chrome DevTools profiling
agent-browser profiler start
# ... actions ...
agent-browser profiler stop trace.json

See debugging.md for details.

Examples

Form Submission

agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Verify result

Auth with Saved State

# Login once
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Later: reuse saved auth
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

More auth patterns in auth.md.

Token Auth (Skip Login)

agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser snapshot -i --json

Debugging

agent-browser --headed open example.com  # Show browser window
agent-browser console                    # View console messages
agent-browser errors                     # View page errors
agent-browser highlight @e1              # Highlight element
agent-browser --debug open example.com   # Verbose output

See debugging.md for traces, profiling, video, common issues.

Session Cleanup

Always close sessions when done to avoid leaked processes:

agent-browser close                    # Close default session
agent-browser --session name close     # Close specific session

If previous session not closed properly, daemon may still be running. agent-browser close cleans it up.

Troubleshooting

“Browser not launched” error: Daemon stuck. Kill and retry:

pkill -f agent-browser && agent-browser open <url>

--headed not showing window: Daemon reuse bug. If daemon started headless, --headed is ignored. Kill daemon first:

agent-browser close
pkill -f "node.*daemon.js.*AGENT_BROWSER"
pkill -f "Google Chrome for Testing"
sleep 1
agent-browser open <url> --headed

Window exists but not visible (macOS):

osascript -e 'tell application "Google Chrome for Testing" to activate'

Element not found: Re-snapshot after page changes. DOM may have updated.

Ref lifecycle: Refs (@e1, @e2) are invalidated when the page changes. Always re-snapshot after clicks that navigate, form submissions, or dynamic content loading.

References

Topic File
Full command reference commands.md
Snapshot refs, lifecycle, troubleshooting snapshot-refs.md
Auth, OAuth, 2FA, state persistence auth.md
Sessions, parallel browsers, state session-management.md
Debugging, profiling, video recording debugging.md
Proxy, geo-testing, rotating proxies proxy.md
Network mocking, tabs, frames, dialogs, settings advanced.md