agent-browser

📁 watzon/claude-code 📅 Feb 6, 2026

总安装量

周安装量

#74848

全站排名

安装命令

npx skills add https://github.com/watzon/claude-code --skill agent-browser

Agent 安装分布

claude-code 2

mcpjam 1

kilo 1

junie 1

windsurf 1

zencoder 1

Skill 文档

agent-browser Skill

Description

Use this skill for headless browser automation tasks. agent-browser is a CLI designed specifically for AI agents, providing a clean interface to Playwright with ref-based element selection that eliminates the need for complex CSS selectors.

Use when:

Automating web interactions (clicking, typing, form submission)
Scraping dynamic content that requires JavaScript execution
Testing web applications
Logging into websites and performing authenticated actions
Navigating multi-step web workflows
Taking screenshots or extracting page content

Prerequisites

# Install via npm (runs as daemon, CLI is Rust-based)
npm install -g @anthropic-ai/agent-browser

# Or via npx (no install needed)
npx @anthropic-ai/agent-browser --help

Core Workflow

The fundamental pattern for AI browser automation:

# 1. Open a page
agent-browser open "https://example.com"

# 2. Get interactive elements with refs
agent-browser snapshot -i

# 3. Interact using refs from snapshot
agent-browser click @e5
agent-browser fill @e12 "search query"

# 4. Re-snapshot after page changes
agent-browser snapshot -i

Key Insight: Always use refs (@e1, @e2, etc.) from snapshots rather than CSS selectors. Refs are stable identifiers assigned during snapshot that make element targeting trivial.

Commands Reference

Navigation

# Open URL
agent-browser open "https://example.com"

# Open with custom headers (auth, cookies)
agent-browser open "https://api.example.com" --headers "Authorization: Bearer token123"

# Close browser
agent-browser close

Snapshots (Critical for AI)

# Full accessibility tree
agent-browser snapshot

# Interactive elements only (RECOMMENDED for AI)
agent-browser snapshot -i

# Compact output (less verbose)
agent-browser snapshot -c

# Limit depth
agent-browser snapshot -d 5

# Filter by CSS selector
agent-browser snapshot -s "#main-content"

# Combine flags
agent-browser snapshot -i -c -d 3

# JSON output for parsing
agent-browser snapshot -i --json

Interactions

# Click element by ref
agent-browser click @e5

# Click by CSS selector (avoid if possible)
agent-browser click "button.submit"

# Fill input field (clears existing content)
agent-browser fill @e12 "hello@example.com"

# Type text (appends, supports special keys)
agent-browser type @e12 "additional text"

# Press keyboard keys
agent-browser press Enter
agent-browser press Control+a
agent-browser press Tab

# Scroll
agent-browser scroll down
agent-browser scroll up
agent-browser scroll @e5  # Scroll element into view

Data Extraction

# Get visible text content
agent-browser get text
agent-browser get text @e5  # Specific element

# Get HTML
agent-browser get html
agent-browser get html @e5

# Get input value
agent-browser get value @e12

# Screenshot
agent-browser screenshot output.png
agent-browser screenshot --fullpage output.png

Waiting

# Wait for element to appear
agent-browser wait @e5
agent-browser wait "div.loaded"

# Wait for navigation
agent-browser wait navigation

# Wait with timeout (ms)
agent-browser wait @e5 --timeout 10000

Session Management

# Isolated session (separate cookies, storage)
agent-browser --session myproject open "https://example.com"
agent-browser --session myproject snapshot -i
agent-browser --session myproject click @e5

# Default session is used if not specified

Best Practices

1. Always Use `-i` Flag for Snapshots

# Good - only interactive elements
agent-browser snapshot -i

# Avoid - too much noise
agent-browser snapshot

2. Prefer Refs Over CSS Selectors

# Good - stable ref from snapshot
agent-browser click @e5

# Avoid - brittle selector
agent-browser click "div.container > ul > li:nth-child(3) > button"

3. Use `--json` for Parsing

# Parse snapshot programmatically
agent-browser snapshot -i --json | jq '.elements[] | select(.role == "button")'

4. Re-snapshot After Page Changes

After any interaction that might change the page (click, submit, navigation), take a fresh snapshot. Refs are invalidated when the DOM changes.

5. Use Sessions for Parallel Work

# Separate sessions for different tasks
agent-browser --session task1 open "https://site1.com"
agent-browser --session task2 open "https://site2.com"

Common Patterns

Login Flow

agent-browser open "https://app.example.com/login"
agent-browser snapshot -i
# Output shows: @e3 textbox "Email", @e5 textbox "Password", @e7 button "Sign In"

agent-browser fill @e3 "user@example.com"
agent-browser fill @e5 "password123"
agent-browser click @e7
agent-browser wait navigation
agent-browser snapshot -i  # Verify logged in

Form Submission

agent-browser open "https://example.com/form"
agent-browser snapshot -i

# Fill multiple fields
agent-browser fill @e2 "John Doe"
agent-browser fill @e4 "john@example.com"
agent-browser fill @e6 "Hello, this is my message."

# Select dropdown (click to open, then click option)
agent-browser click @e8
agent-browser snapshot -i  # Get dropdown options
agent-browser click @e12   # Select option

# Submit
agent-browser click @e15
agent-browser wait navigation

Data Extraction

agent-browser open "https://example.com/products"
agent-browser snapshot -i --json > products.json

# Or get specific content
agent-browser get text ".product-list"
agent-browser get html "#main-content"

Search and Navigate Results

agent-browser open "https://example.com"
agent-browser snapshot -i
# @e5 searchbox "Search"

agent-browser fill @e5 "query"
agent-browser press Enter
agent-browser wait navigation
agent-browser snapshot -i
# Shows search results with refs

agent-browser click @e10  # Click first result

Handle Modals/Popups

# After action triggers modal
agent-browser click @e5
agent-browser snapshot -i  # Modal elements now visible

# Interact with modal
agent-browser fill @e20 "confirmation"
agent-browser click @e22  # Confirm button

Screenshot for Verification

# After completing workflow
agent-browser screenshot verification.png

# Full page capture
agent-browser screenshot --fullpage full-page.png

Snapshot Output Format

When you run agent-browser snapshot -i, output looks like:

@e1 link "Home"
@e2 link "Products"
@e3 link "About"
@e4 searchbox "Search..."
@e5 button "Search"
@e6 textbox "Email address"
@e7 button "Subscribe"

With --json:

{
  "elements": [
    {"ref": "@e1", "role": "link", "name": "Home"},
    {"ref": "@e2", "role": "link", "name": "Products"},
    {"ref": "@e4", "role": "searchbox", "name": "Search..."},
    {"ref": "@e5", "role": "button", "name": "Search"}
  ]
}

Troubleshooting

Element Not Found

# Re-snapshot - DOM may have changed
agent-browser snapshot -i

# Check if element is in viewport
agent-browser scroll @e5
agent-browser snapshot -i

Timeout Errors

# Increase timeout
agent-browser wait @e5 --timeout 30000

# Or wait for specific condition
agent-browser wait navigation

Session Issues

# Close and restart session
agent-browser close
agent-browser open "https://example.com"

Debugging

# Take screenshot to see current state
agent-browser screenshot debug.png

# Get full HTML for inspection
agent-browser get html > page.html

Architecture Notes

Rust CLI handles command parsing and communicates with daemon
Node.js daemon manages Playwright browser instances
Daemon starts automatically on first command
Sessions provide isolation (cookies, localStorage, separate browser contexts)
Refs (@e1, @e2) are assigned per-snapshot and tied to the accessibility tree

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台

agent-browser

Agent 安装分布

Skill 文档

agent-browser Skill

Description

Prerequisites

Core Workflow

Commands Reference

Navigation

Snapshots (Critical for AI)

Interactions

Data Extraction

Waiting

Session Management

Best Practices

1. Always Use -i Flag for Snapshots

2. Prefer Refs Over CSS Selectors

3. Use --json for Parsing

4. Re-snapshot After Page Changes

5. Use Sessions for Parallel Work

Common Patterns

Login Flow

Form Submission

Data Extraction

Search and Navigate Results

Handle Modals/Popups

Screenshot for Verification

Snapshot Output Format

Troubleshooting

Element Not Found

Timeout Errors

Session Issues

Debugging

Architecture Notes

1. Always Use `-i` Flag for Snapshots

3. Use `--json` for Parsing