agent-browser
1
总安装量
1
周安装量
#51605
全站排名
安装命令
npx skills add https://github.com/chipagosfinest/claude-integration-tools --skill agent-browser
Agent 安装分布
replit
1
openclaw
1
opencode
1
codex
1
claude-code
1
Skill 文档
agent-browser
Browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
Installation
npm install -g agent-browser
agent-browser install # Download Chromium
Core Workflow (Optimal for AI)
# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json # Interactive elements only with refs
# 2. Identify target refs from snapshot output
# Snapshot shows: - button "Submit" [ref=e2]
# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"
# 4. Re-snapshot after page changes
agent-browser snapshot -i --json
Essential Commands
Navigation
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
Snapshot (Best for AI)
agent-browser snapshot # Full accessibility tree with refs
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -i -c --json # Combine options, JSON output
Interaction
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "text" # Fill input by ref
agent-browser type @e3 "text" # Type into element (doesn't clear first)
agent-browser press Enter # Press key
agent-browser hover @e4 # Hover element
agent-browser scroll down 500 # Scroll direction + pixels
agent-browser select @e5 "option" # Select dropdown option
agent-browser check @e6 # Check checkbox
agent-browser upload @e7 file.pdf # Upload file
Get Information
agent-browser get text @e1 # Get text content
agent-browser get html @e1 # Get innerHTML
agent-browser get value @e3 # Get input value
agent-browser get title # Page title
agent-browser get url # Current URL
agent-browser get count ".items" # Count matching elements
Screenshots & Output
agent-browser screenshot # Screenshot to stdout (base64)
agent-browser screenshot page.png # Save screenshot
agent-browser screenshot --full # Full page screenshot
agent-browser pdf output.pdf # Save as PDF
Wait
agent-browser wait "#element" # Wait for element visible
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for network idle
Sessions
agent-browser --session agent1 open site.com # Isolated session
agent-browser session list # List active sessions
Semantic Locators (Alternative to Refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
Selector Types
| Type | Example | Description |
|---|---|---|
| Refs | @e2 |
From snapshot, deterministic |
| CSS | "#id", ".class" |
Standard CSS selectors |
| Text | "text=Submit" |
Match by text content |
| XPath | "xpath=//button" |
XPath selectors |
Options
| Option | Description |
|---|---|
--json |
JSON output (for parsing) |
--headed |
Show browser window |
--session <name> |
Use isolated session |
--headers <json> |
Set HTTP headers |
--cdp <port> |
Connect via Chrome DevTools Protocol |
Authenticated Sessions
# Headers scoped to origin
agent-browser open api.example.com --headers '{"Authorization": "Bearer token"}'
CDP Mode (Connect to Existing Browser)
# Start Chrome with: google-chrome --remote-debugging-port=9222
agent-browser connect 9222
agent-browser snapshot
Programmatic API (Node.js)
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
const snapshot = await browser.getSnapshot({ interactive: true });
await browser.click('@e2');
await browser.fill('@e3', 'input text');
await browser.close();
Best Practices
- Always use refs from snapshots – More reliable than CSS selectors
- Use
-iflag – Reduces noise, shows only interactive elements - Re-snapshot after navigation – Refs are only valid for current page state
- Use
--jsonfor parsing – Structured output for programmatic use - Sessions for parallel work – Each session has isolated state
Troubleshooting
Linux: Missing dependencies
agent-browser install --with-deps
Custom browser path
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com