agent-browser

📁 clawdbrunner/skill-agent-browser 📅 4 days ago

总安装量

周安装量

#53047

全站排名

安装命令

npx skills add https://github.com/clawdbrunner/skill-agent-browser --skill agent-browser

Agent 安装分布

replit 1

openclaw 1

Skill 文档

agent-browser Skill

Browser automation that actually works for AI agents. Built by Vercel Labs specifically for LLM-driven workflows.

Why This Works Better Than Alternatives

1. Deterministic Refs (The Game-Changer)

Problem with traditional tools:

CSS selectors break when websites change
XPath is brittle and unreadable
Coordinate-based clicking fails on responsive layouts
Vision-based approaches are slow and expensive

The agent-browser solution:

# 1. Get snapshot with stable refs
agent-browser snapshot -i --json
# Output: - button "Submit" [ref=e2]

# 2. Use that ref forever â it points to the EXACT element
agent-browser click @e2

Refs are deterministic â @e2 always points to the same element from your snapshot
No DOM re-query â direct reference is faster and more reliable
AI-optimized â LLMs parse the accessibility tree naturally, not CSS soup

2. Accessibility Trees > Screenshots/HTML

Traditional tools give you raw HTML (noisy) or screenshots (require vision models).

agent-browser gives you the accessibility tree â a clean, semantic representation of what a human (or screen reader) would perceive:

- heading "Billing" [level=1]
- link "Make a payment" [ref=e10]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]

Semantic roles (button, link, textbox, heading)
Human-readable labels
Hierarchical structure
Perfect for LLM comprehension

3. Built for AI Agents

Feature	Traditional Tools	agent-browser
Element targeting	Fragile selectors	Deterministic refs
Page understanding	Raw HTML	Accessibility tree
Output format	Text logs	Structured JSON
Speed	Slow (full browser per command)	Fast (daemon persists)
AI integration	Afterthought	Purpose-built

4. Fast Architecture

Rust CLI â Native binary, instant command parsing
Node.js Daemon â Browser stays warm between commands
First command: ~2s (daemon startup)
Subsequent commands: ~100ms

Prerequisites

npm install -g agent-browser
agent-browser install  # Download Chromium (~30s)

Core AI Workflow

The workflow designed for LLM agents:

# Step 1: Navigate
agent-browser open https://example.com

# Step 2: Get structured snapshot (the AI "sees" the page)
agent-browser snapshot -i --json

# Step 3: AI picks refs from JSON, execute actions
agent-browser click @e2
agent-browser fill @e3 "test@example.com"

# Step 4: Re-snapshot after changes (state verification)
agent-browser snapshot -i --json

# Step 5: Done
agent-browser close

Commands

Navigation

agent-browser open example.com
agent-browser open example.com --json            # JSON response
agent-browser open example.com --headed          # Visible browser

Snapshot (The Killer Feature)

agent-browser snapshot                           # Full accessibility tree
agent-browser snapshot -i                        # Interactive only (faster)
agent-browser snapshot -i --json                 # JSON for AI parsing
agent-browser snapshot -i -c -d 5 --json         # Compact, depth-limited

Interaction (Using Deterministic Refs)

agent-browser click @e2                          # Click element @e2
agent-browser fill @e3 "text"                    # Fill and clear
agent-browser type @e3 "text"                    # Type without clearing
agent-browser press Enter                        # Press key
agent-browser hover @e4                          # Hover

State Verification

agent-browser get text @e1                       # Get element text
agent-browser get url                            # Current URL
agent-browser is visible @e2                     # Check visibility

Session Management

agent-browser --session login open site.com      # Isolated session
agent-browser --profile ~/.myprofile open site   # Persistent cookies
agent-browser close                              # Clean up

Selector Strategies (Ranked by Reliability)

1. Refs (Best – Use These)

# From snapshot output â deterministic and stable
agent-browser click @e2
agent-browser fill @e3 "text"

2. Semantic Locators (Good)

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"

3. CSS Selectors (Okay for static sites)

agent-browser click "#submit"
agent-browser click ".btn-primary"

4. Text/XPath (Last resort)

agent-browser click "text=Submit"
agent-browser click "xpath=//button[1]"

Snapshot Options

Control what the AI “sees”:

Flag	Purpose
`-i`	Interactive elements only (buttons, links, inputs) â recommended
`-C`	Include cursor-interactive elements (onclick, cursor:pointer)
`-c`	Compact (remove empty structural elements)
`-d <n>`	Limit tree depth
`-s <sel>`	Scope to CSS selector (e.g., `#main`)
`--json`	Machine-readable JSON output â essential for AI

Recommended AI command:

agent-browser snapshot -i -c --json

Options

Flag	Description
`--json`	JSON output with success/data/error structure
`--headed`	Show browser window (for debugging)
`--session <name>`	Isolated browser session
`--profile <path>`	Persistent profile for cookies/logins
`--cdp <port>`	Connect to existing Chrome via DevTools Protocol
`--headers <json>`	Set auth headers per origin

Example: Complete Login Flow

# Start
agent-browser open https://portal.aeronetpr.com

# Get page structure
SNAPSHOT=$(agent-browser snapshot -i --json)
# AI parses JSON: sees textbox @e1 (Username), textbox @e2 (Password), button @e3 (Login)

# Execute login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Verify success (wait for navigation, re-snapshot)
sleep 2
agent-browser snapshot -i --json

# Done
agent-browser close

Tips for AI Agents

Always use --json â Structured output is easier to parse than text
Use -i flag â Interactive-only snapshots are smaller, faster, cleaner
Re-snapshot after actions â Verify state changed as expected
Trust refs over selectors â @e2 from snapshot > #id that might change
Use semantic locators when refs expire â find role button click is robust
Session persistence â One open, many commands, one close

Comparison to Other Tools

Tool	Best For	Why agent-browser Wins
Puppeteer/Playwright	Dev testing	Built for humans; brittle selectors
Selenium	Legacy testing	Slow, heavy, selector-based
browser-use	Python agents	agent-browser has better refs system
Screenshot + Vision	Visual tasks	agent-browser is 10x faster, 100x cheaper
OpenClaw browser tool	Simple tasks	agent-browser handles complex flows better

When to Use This Skill

Use agent-browser when:

Automating multi-step web workflows
Filling complex forms
Need reliable, repeatable automation
Working with dynamic/modern web apps
Cost matters (no vision API calls)

Use OpenClaw’s built-in browser tool when:

Simple single-page checks
Quick screenshot needed
Already authenticated session in Chrome

Resources

Vercel Labs repo: https://github.com/vercel-labs/agent-browser
This skill repo: https://github.com/clawdbrunner/skill-agent-browser

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台