agent-browser
npx skills add https://github.com/abpai/skills --skill agent-browser
Agent 安装分布
Skill 文档
Browser Automation with agent-browser
Core Workflow
Every browser automation follows this pattern:
- Connect (if
$AGENT_BROWSER_CDP_PORTis set):agent-browser connect $AGENT_BROWSER_CDP_PORTâ see CDP section below - Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(get element refs like@e1,@e2) - Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser get url
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Open Confirmation (Required)
After every open, confirm the browser actually reached the target page before continuing:
agent-browser open <url>
agent-browser get url
agent-browser get title
If get url is still about:blank, the page closes immediately, or load does not stabilize, stop and report this to the user before taking more actions.
CDP Connection (Authenticated Browser)
If AGENT_BROWSER_CDP_PORT is set, connect to an existing Chrome instance instead of
launching a fresh browser. This preserves logged-in sessions, cookies, and extensions.
Connect
At the start of any browser session, connect to the CDP browser:
agent-browser connect $AGENT_BROWSER_CDP_PORT
If the connection fails (Chrome not running), launch it first:
# Check if browser is listening
lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1
# If not running, launch via AGENT_BROWSER_CDP_LAUNCH
if [ -n "$AGENT_BROWSER_CDP_LAUNCH" ]; then
eval "$AGENT_BROWSER_CDP_LAUNCH" &
sleep 3
fi
# Then connect
agent-browser connect $AGENT_BROWSER_CDP_PORT
Command-Runner Note (Isolated)
In some command-runner environments, launching Chrome with direct binary + & can be cleaned up when that command exits. If this happens, use a detached macOS launch command and verify the listener before connecting:
if ! lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1; then
open -na "Google Chrome" --args \
--remote-debugging-port="$AGENT_BROWSER_CDP_PORT" \
--user-data-dir="$HOME/Projects/ai-chrome-profile"
sleep 3
fi
lsof -i :$AGENT_BROWSER_CDP_PORT -sTCP:LISTEN >/dev/null 2>&1
agent-browser connect $AGENT_BROWSER_CDP_PORT
After connect succeeds, verify navigation worked (do not assume it did):
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser get url
If AGENT_BROWSER_CDP_PORT is set but CDP still fails after retrying launch/connect, do not silently switch methods. Explain the failure to the user and ask for a fallback choice:
headed fresh session:agent-browser --headed open <url>headless fresh session:agent-browser open <url>
Example message:
I couldn't connect through CDP on port $AGENT_BROWSER_CDP_PORT after retrying. Would you like me to continue with a headed fresh browser session or a headless fresh session?
Once connected, all commands work normally â no extra flags needed:
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser screenshot
For workflows split across separate command invocations, --cdp per command can be more robust:
agent-browser --cdp $AGENT_BROWSER_CDP_PORT open https://example.com
agent-browser --cdp $AGENT_BROWSER_CDP_PORT snapshot -i
agent-browser --cdp $AGENT_BROWSER_CDP_PORT click @e1
Clean Session (No Profile)
When the user explicitly asks for a “clean”, “fresh”, or “incognito” session, skip CDP â use the default ephemeral browser regardless of env vars:
agent-browser open <url> # Without connect = fresh browser
Essential Commands
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser scroll down 500 # Scroll page
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser pdf output.pdf # Save as PDF
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
Authentication with State Persistence
# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5 # Get specific element text
agent-browser get text body > page.txt # Get all page text
# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
Visual Browser (Debugging)
agent-browser --headed open https://example.com
agent-browser highlight @e1 # Highlight element
agent-browser record start demo.webm # Record session
iOS Simulator (Mobile Safari)
# List available iOS simulators
agent-browser device list
# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile-specific gesture
# Take screenshot
agent-browser -p ios screenshot mobile.png
# Close session (shuts down simulator)
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # Use new refs
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
Deep-Dive Documentation
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output