agent-browser

📁 chipagosfinest/claude-integration-tools 📅 7 days ago
1
总安装量
1
周安装量
#51605
全站排名
安装命令
npx skills add https://github.com/chipagosfinest/claude-integration-tools --skill agent-browser

Agent 安装分布

replit 1
openclaw 1
opencode 1
codex 1
claude-code 1

Skill 文档

agent-browser

Browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

npm install -g agent-browser
agent-browser install  # Download Chromium

Core Workflow (Optimal for AI)

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json   # Interactive elements only with refs

# 2. Identify target refs from snapshot output
# Snapshot shows: - button "Submit" [ref=e2]

# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"

# 4. Re-snapshot after page changes
agent-browser snapshot -i --json

Essential Commands

Navigation

agent-browser open <url>              # Navigate to URL
agent-browser back                    # Go back
agent-browser forward                 # Go forward
agent-browser reload                  # Reload page

Snapshot (Best for AI)

agent-browser snapshot                # Full accessibility tree with refs
agent-browser snapshot -i             # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c             # Compact (remove empty structural elements)
agent-browser snapshot -d 3           # Limit depth to 3 levels
agent-browser snapshot -i -c --json   # Combine options, JSON output

Interaction

agent-browser click @e2               # Click by ref from snapshot
agent-browser fill @e3 "text"         # Fill input by ref
agent-browser type @e3 "text"         # Type into element (doesn't clear first)
agent-browser press Enter             # Press key
agent-browser hover @e4               # Hover element
agent-browser scroll down 500         # Scroll direction + pixels
agent-browser select @e5 "option"     # Select dropdown option
agent-browser check @e6               # Check checkbox
agent-browser upload @e7 file.pdf     # Upload file

Get Information

agent-browser get text @e1            # Get text content
agent-browser get html @e1            # Get innerHTML
agent-browser get value @e3           # Get input value
agent-browser get title               # Page title
agent-browser get url                 # Current URL
agent-browser get count ".items"      # Count matching elements

Screenshots & Output

agent-browser screenshot              # Screenshot to stdout (base64)
agent-browser screenshot page.png     # Save screenshot
agent-browser screenshot --full       # Full page screenshot
agent-browser pdf output.pdf          # Save as PDF

Wait

agent-browser wait "#element"         # Wait for element visible
agent-browser wait 2000               # Wait milliseconds
agent-browser wait --text "Welcome"   # Wait for text
agent-browser wait --url "**/dash"    # Wait for URL pattern
agent-browser wait --load networkidle # Wait for network idle

Sessions

agent-browser --session agent1 open site.com  # Isolated session
agent-browser session list                     # List active sessions

Semantic Locators (Alternative to Refs)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"

Selector Types

Type Example Description
Refs @e2 From snapshot, deterministic
CSS "#id", ".class" Standard CSS selectors
Text "text=Submit" Match by text content
XPath "xpath=//button" XPath selectors

Options

Option Description
--json JSON output (for parsing)
--headed Show browser window
--session <name> Use isolated session
--headers <json> Set HTTP headers
--cdp <port> Connect via Chrome DevTools Protocol

Authenticated Sessions

# Headers scoped to origin
agent-browser open api.example.com --headers '{"Authorization": "Bearer token"}'

CDP Mode (Connect to Existing Browser)

# Start Chrome with: google-chrome --remote-debugging-port=9222
agent-browser connect 9222
agent-browser snapshot

Programmatic API (Node.js)

import { BrowserManager } from 'agent-browser';

const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');

const snapshot = await browser.getSnapshot({ interactive: true });
await browser.click('@e2');
await browser.fill('@e3', 'input text');

await browser.close();

Best Practices

  1. Always use refs from snapshots – More reliable than CSS selectors
  2. Use -i flag – Reduces noise, shows only interactive elements
  3. Re-snapshot after navigation – Refs are only valid for current page state
  4. Use --json for parsing – Structured output for programmatic use
  5. Sessions for parallel work – Each session has isolated state

Troubleshooting

Linux: Missing dependencies

agent-browser install --with-deps

Custom browser path

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

References