agent-browser

📁 watzon/claude-code 📅 Feb 6, 2026
2
总安装量
2
周安装量
#74848
全站排名
安装命令
npx skills add https://github.com/watzon/claude-code --skill agent-browser

Agent 安装分布

claude-code 2
mcpjam 1
kilo 1
junie 1
windsurf 1
zencoder 1

Skill 文档

agent-browser Skill

Description

Use this skill for headless browser automation tasks. agent-browser is a CLI designed specifically for AI agents, providing a clean interface to Playwright with ref-based element selection that eliminates the need for complex CSS selectors.

Use when:

  • Automating web interactions (clicking, typing, form submission)
  • Scraping dynamic content that requires JavaScript execution
  • Testing web applications
  • Logging into websites and performing authenticated actions
  • Navigating multi-step web workflows
  • Taking screenshots or extracting page content

Prerequisites

# Install via npm (runs as daemon, CLI is Rust-based)
npm install -g @anthropic-ai/agent-browser

# Or via npx (no install needed)
npx @anthropic-ai/agent-browser --help

Core Workflow

The fundamental pattern for AI browser automation:

# 1. Open a page
agent-browser open "https://example.com"

# 2. Get interactive elements with refs
agent-browser snapshot -i

# 3. Interact using refs from snapshot
agent-browser click @e5
agent-browser fill @e12 "search query"

# 4. Re-snapshot after page changes
agent-browser snapshot -i

Key Insight: Always use refs (@e1, @e2, etc.) from snapshots rather than CSS selectors. Refs are stable identifiers assigned during snapshot that make element targeting trivial.

Commands Reference

Navigation

# Open URL
agent-browser open "https://example.com"

# Open with custom headers (auth, cookies)
agent-browser open "https://api.example.com" --headers "Authorization: Bearer token123"

# Close browser
agent-browser close

Snapshots (Critical for AI)

# Full accessibility tree
agent-browser snapshot

# Interactive elements only (RECOMMENDED for AI)
agent-browser snapshot -i

# Compact output (less verbose)
agent-browser snapshot -c

# Limit depth
agent-browser snapshot -d 5

# Filter by CSS selector
agent-browser snapshot -s "#main-content"

# Combine flags
agent-browser snapshot -i -c -d 3

# JSON output for parsing
agent-browser snapshot -i --json

Interactions

# Click element by ref
agent-browser click @e5

# Click by CSS selector (avoid if possible)
agent-browser click "button.submit"

# Fill input field (clears existing content)
agent-browser fill @e12 "hello@example.com"

# Type text (appends, supports special keys)
agent-browser type @e12 "additional text"

# Press keyboard keys
agent-browser press Enter
agent-browser press Control+a
agent-browser press Tab

# Scroll
agent-browser scroll down
agent-browser scroll up
agent-browser scroll @e5  # Scroll element into view

Data Extraction

# Get visible text content
agent-browser get text
agent-browser get text @e5  # Specific element

# Get HTML
agent-browser get html
agent-browser get html @e5

# Get input value
agent-browser get value @e12

# Screenshot
agent-browser screenshot output.png
agent-browser screenshot --fullpage output.png

Waiting

# Wait for element to appear
agent-browser wait @e5
agent-browser wait "div.loaded"

# Wait for navigation
agent-browser wait navigation

# Wait with timeout (ms)
agent-browser wait @e5 --timeout 10000

Session Management

# Isolated session (separate cookies, storage)
agent-browser --session myproject open "https://example.com"
agent-browser --session myproject snapshot -i
agent-browser --session myproject click @e5

# Default session is used if not specified

Best Practices

1. Always Use -i Flag for Snapshots

# Good - only interactive elements
agent-browser snapshot -i

# Avoid - too much noise
agent-browser snapshot

2. Prefer Refs Over CSS Selectors

# Good - stable ref from snapshot
agent-browser click @e5

# Avoid - brittle selector
agent-browser click "div.container > ul > li:nth-child(3) > button"

3. Use --json for Parsing

# Parse snapshot programmatically
agent-browser snapshot -i --json | jq '.elements[] | select(.role == "button")'

4. Re-snapshot After Page Changes

After any interaction that might change the page (click, submit, navigation), take a fresh snapshot. Refs are invalidated when the DOM changes.

5. Use Sessions for Parallel Work

# Separate sessions for different tasks
agent-browser --session task1 open "https://site1.com"
agent-browser --session task2 open "https://site2.com"

Common Patterns

Login Flow

agent-browser open "https://app.example.com/login"
agent-browser snapshot -i
# Output shows: @e3 textbox "Email", @e5 textbox "Password", @e7 button "Sign In"

agent-browser fill @e3 "user@example.com"
agent-browser fill @e5 "password123"
agent-browser click @e7
agent-browser wait navigation
agent-browser snapshot -i  # Verify logged in

Form Submission

agent-browser open "https://example.com/form"
agent-browser snapshot -i

# Fill multiple fields
agent-browser fill @e2 "John Doe"
agent-browser fill @e4 "john@example.com"
agent-browser fill @e6 "Hello, this is my message."

# Select dropdown (click to open, then click option)
agent-browser click @e8
agent-browser snapshot -i  # Get dropdown options
agent-browser click @e12   # Select option

# Submit
agent-browser click @e15
agent-browser wait navigation

Data Extraction

agent-browser open "https://example.com/products"
agent-browser snapshot -i --json > products.json

# Or get specific content
agent-browser get text ".product-list"
agent-browser get html "#main-content"

Search and Navigate Results

agent-browser open "https://example.com"
agent-browser snapshot -i
# @e5 searchbox "Search"

agent-browser fill @e5 "query"
agent-browser press Enter
agent-browser wait navigation
agent-browser snapshot -i
# Shows search results with refs

agent-browser click @e10  # Click first result

Handle Modals/Popups

# After action triggers modal
agent-browser click @e5
agent-browser snapshot -i  # Modal elements now visible

# Interact with modal
agent-browser fill @e20 "confirmation"
agent-browser click @e22  # Confirm button

Screenshot for Verification

# After completing workflow
agent-browser screenshot verification.png

# Full page capture
agent-browser screenshot --fullpage full-page.png

Snapshot Output Format

When you run agent-browser snapshot -i, output looks like:

@e1 link "Home"
@e2 link "Products"
@e3 link "About"
@e4 searchbox "Search..."
@e5 button "Search"
@e6 textbox "Email address"
@e7 button "Subscribe"

With --json:

{
  "elements": [
    {"ref": "@e1", "role": "link", "name": "Home"},
    {"ref": "@e2", "role": "link", "name": "Products"},
    {"ref": "@e4", "role": "searchbox", "name": "Search..."},
    {"ref": "@e5", "role": "button", "name": "Search"}
  ]
}

Troubleshooting

Element Not Found

# Re-snapshot - DOM may have changed
agent-browser snapshot -i

# Check if element is in viewport
agent-browser scroll @e5
agent-browser snapshot -i

Timeout Errors

# Increase timeout
agent-browser wait @e5 --timeout 30000

# Or wait for specific condition
agent-browser wait navigation

Session Issues

# Close and restart session
agent-browser close
agent-browser open "https://example.com"

Debugging

# Take screenshot to see current state
agent-browser screenshot debug.png

# Get full HTML for inspection
agent-browser get html > page.html

Architecture Notes

  • Rust CLI handles command parsing and communicates with daemon
  • Node.js daemon manages Playwright browser instances
  • Daemon starts automatically on first command
  • Sessions provide isolation (cookies, localStorage, separate browser contexts)
  • Refs (@e1, @e2) are assigned per-snapshot and tied to the accessibility tree