interact-with-browser

📁 richardanaya/agent-skills 📅 Jan 17, 2026

总安装量

周安装量

#16363

全站排名

安装命令

npx skills add https://github.com/richardanaya/agent-skills --skill interact-with-browser

Agent 安装分布

opencode 12

claude-code 12

antigravity 10

codex 10

windsurf 9

Skill 文档

BE SURE TO CLEAN UP SCREEN SHOTS AFTER YOU ARE DONE WITH EVERYTHING

IF THIS NEEDS TO BE INSTALLED

npm install -g agent-browser
agent-browser install # to get chromium downloaded

agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 “test@example.com” # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close

Traditional Selectors (also supported)

agent-browser click “#submit” agent-browser fill “#email” “test@example.com” agent-browser find role button click –name “Submit”

Commands Core Commands

agent-browser open # Navigate to URL (aliases: goto, navigate) agent-browser click # Click element agent-browser dblclick # Double-click element agent-browser focus # Focus element agent-browser type # Type into element agent-browser fill # Clear and fill agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keydown # Hold key down agent-browser keyup # Release key agent-browser hover # Hover element agent-browser select # Select dropdown option agent-browser check # Check checkbox agent-browser uncheck # Uncheck checkbox agent-browser scroll [px] # Scroll (up/down/left/right) agent-browser scrollintoview # Scroll element into view (alias: scrollinto) agent-browser drag # Drag and drop agent-browser upload # Upload files agent-browser screenshot [path] # Take screenshot (–full for full page) agent-browser pdf # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval # Run JavaScript agent-browser close # Close browser (aliases: quit, exit)

Get Info

agent-browser get text # Get text content agent-browser get html # Get innerHTML agent-browser get value # Get input value agent-browser get attr # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count # Count matching elements agent-browser get box # Get bounding box

Check State

agent-browser is visible # Check if visible agent-browser is enabled # Check if enabled agent-browser is checked # Check if checked

Find Elements (Semantic Locators)

agent-browser find role [value] # By ARIA role agent-browser find text # By text content agent-browser find label [value] # By label agent-browser find placeholder [value] # By placeholder agent-browser find alt # By alt text agent-browser find title # By title attr agent-browser find testid [value] # By data-testid agent-browser find first [value] # First match agent-browser find last [value] # Last match agent-browser find nth [value] # Nth match

Actions: click, fill, check, hover, text

Examples:

agent-browser find role button click –name “Submit” agent-browser find text “Sign In” click agent-browser find label “Email” fill “test@test.com” agent-browser find first “.item” click agent-browser find nth 2 “a” text

Wait

agent-browser wait # Wait for element to be visible agent-browser wait # Wait for time (milliseconds) agent-browser wait –text “Welcome” # Wait for text to appear agent-browser wait –url “**/dash” # Wait for URL pattern agent-browser wait –load networkidle # Wait for load state agent-browser wait –fn “window.ready === true” # Wait for JS condition

Load states: load, domcontentloaded, networkidle Mouse Control

agent-browser mouse move # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel [dx] # Scroll wheel

Browser Settings

agent-browser set viewport # Set viewport size agent-browser set device # Emulate device (“iPhone 14”) agent-browser set geo # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers # Extra HTTP headers agent-browser set credentials # HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme

Cookies & Storage

agent-browser cookies # Get all cookies agent-browser cookies set # Set cookie agent-browser cookies clear # Clear cookies

agent-browser storage local # Get all localStorage agent-browser storage local # Get specific key agent-browser storage local set # Set value agent-browser storage local clear # Clear all

agent-browser storage session # Same for sessionStorage

Network

agent-browser network route # Intercept requests agent-browser network route –abort # Block requests agent-browser network route –body # Mock response agent-browser network unroute [url] # Remove routes agent-browser network requests # View tracked requests agent-browser network requests –filter api # Filter requests

Tabs & Windows

agent-browser tab # List tabs agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab # Switch to tab n agent-browser tab close [n] # Close tab agent-browser window new # New window

Frames

agent-browser frame # Switch to iframe agent-browser frame main # Back to main frame

Dialogs

agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss

Debug

agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser console # View console messages agent-browser console –clear # Clear console agent-browser errors # View page errors agent-browser errors –clear # Clear errors agent-browser highlight # Highlight element agent-browser state save # Save auth state agent-browser state load # Load auth state

Navigation

agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page

Setup

agent-browser install # Download Chromium browser agent-browser install –with-deps # Also install system deps (Linux)

Sessions

Run multiple isolated browser instances:

Different sessions

agent-browser –session agent1 open site-a.com agent-browser –session agent2 open site-b.com

Or via environment variable

AGENT_BROWSER_SESSION=agent1 agent-browser click “#btn”

List active sessions

agent-browser session list

Output:

Active sessions:

-> default

agent1

Show current session

agent-browser session

Each session has its own:

Browser instance
Cookies and storage
Navigation history
Authentication state

Snapshot Options

The snapshot command supports filtering to reduce output size:

agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s “#main” # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options

Option Description -i, –interactive Only show interactive elements (buttons, links, inputs) -c, –compact Remove empty structural elements -d, –depth Limit tree depth -s, –selector Scope to CSS selector Options Option Description –session Use isolated session (or AGENT_BROWSER_SESSION env) –headers Set HTTP headers scoped to the URL’s origin –executable-path Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) –json JSON output (for agents) –full, -f Full page screenshot –name, -n Locator name filter –exact Exact text match –headed Show browser window (not headless) –cdp Connect via Chrome DevTools Protocol –debug Debug output Selectors Refs (Recommended for AI)

Refs provide deterministic element selection from snapshots:

1. Get snapshot with refs

agent-browser snapshot

Output:

– heading “Example Domain” [ref=e1] [level=1]

– button “Submit” [ref=e2]

– textbox “Email” [ref=e3]

– link “Learn more” [ref=e4]

2. Use refs to interact

agent-browser click @e2 # Click the button agent-browser fill @e3 “test@example.com” # Fill the textbox agent-browser get text @e1 # Get heading text agent-browser hover @e4 # Hover the link

Why use refs?

Deterministic: Ref points to exact element from snapshot
Fast: No DOM re-query needed
AI-friendly: Snapshot + ref workflow is optimal for LLMs

CSS Selectors

agent-browser click “#id” agent-browser click “.class” agent-browser click “div > button”

Text & XPath

agent-browser click “text=Submit” agent-browser click “xpath=//button”

Semantic Locators

agent-browser find role button click –name “Submit” agent-browser find label “Email” fill “test@test.com“

Agent Mode

Use –json for machine-readable output:

agent-browser snapshot –json

Returns: {“success”:true,”data”:{“snapshot”:”…”,”refs”:{“e1”:{“role”:”heading”,”name”:”Title”},…}}}

agent-browser get text @e1 –json agent-browser is visible @e2 –json

Optimal AI Workflow

1. Navigate and get snapshot

agent-browser open example.com agent-browser snapshot -i –json # AI parses tree and refs

2. AI identifies target refs from snapshot

3. Execute actions using refs

agent-browser click @e2 agent-browser fill @e3 “input text”

4. Get new snapshot if page changed

agent-browser snapshot -i –json

Headed Mode

Show the browser window for debugging:

agent-browser open example.com –headed

This opens a visible browser window instead of running headless. Authenticated Sessions

Use –headers to set HTTP headers for a specific origin, enabling authentication without login flows:

Headers are scoped to api.example.com only

agent-browser open api.example.com –headers ‘{“Authorization”: “Bearer “}’

Requests to api.example.com include the auth header

agent-browser snapshot -i –json agent-browser click @e2

Navigate to another domain – headers are NOT sent (safe!)

agent-browser open other-site.com

This is useful for:

Skipping login flows - Authenticate via headers instead of UI
Switching users - Start new sessions with different auth tokens
API testing - Access protected endpoints directly
Security - Headers are scoped to the origin, not leaked to other domains

To set headers for multiple origins, use –headers with each open command:

agent-browser open api.example.com –headers ‘{“Authorization”: “Bearer token1”}’ agent-browser open api.acme.com –headers ‘{“Authorization”: “Bearer token2”}’

For global headers (all domains), use set headers:

agent-browser set headers ‘{“X-Custom-Header”: “value”}’

Custom Browser Executable

Use a custom browser executable instead of the bundled Chromium. This is useful for:

Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
System browsers: Use an existing Chrome/Chromium installation
Custom builds: Use modified browser builds

CLI Usage

Via flag

agent-browser –executable-path /path/to/chromium open example.com

Via environment variable

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com

Serverless Example (Vercel/AWS Lambda)

import chromium from ‘@sparticuz/chromium’; import { BrowserManager } from ‘agent-browser’;

export async function handler() { const browser = new BrowserManager(); await browser.launch({ executablePath: await chromium.executablePath(), headless: true, }); // … use browser }

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台