browser
npx skills add https://github.com/maxgent-ai/maxgent-plugin --skill browser
Agent 安装分布
Skill 文档
Browser Automation
Browser automation that maintains page state across command executions. Write small, focused commands to accomplish tasks incrementally.
Choosing Your Approach
- Local/source-available sites: Read the source code first to write selectors directly
- Unknown page layouts: Use
snapshotto discover elements, thenselect-refto interact - Visual debugging: Take
screenshotto see current page state
Prerequisites
# Check browser server running (Max must be open)
curl -s http://localhost:9222/ | head -1 || echo "SERVER_NOT_RUNNING"
Running Commands
All commands use client.py from the skill directory:
uv run skills/browser/client.py <command> [arguments]
â ï¸ IMPORTANT: Always use
uv run client.py, NOTuv run python client.py. Theuv runcommand automatically handles Python and dependencies frompyproject.toml. Addingpythonbreaks dependency resolution.
Workflow Loop
Follow this pattern for complex tasks:
- Run a command to perform one action
- Observe the output
- Evaluate – did it work? What’s the current state?
- Decide – is the task complete or do we need another command?
- Repeat until task is done
No TypeScript in Browser Context
Code passed to page.evaluate() runs in the browser, which doesn’t understand TypeScript:
// â
Correct: plain JavaScript
const text = await page.evaluate(() => {
return document.body.innerText;
});
// â Wrong: TypeScript syntax will fail at runtime
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // Type annotation breaks in browser!
return el.innerText;
});
Waiting
uv run skills/browser/client.py wait-load main # After navigation
uv run skills/browser/client.py wait-selector main ".results" # For specific elements
uv run skills/browser/client.py wait-url main "**/success" # For specific URL
Scraping Data
For large datasets, intercept and replay API requests rather than scrolling DOM. See refs/scraping.md for the complete guide covering request capture, schema discovery, and paginated API replay.
Inspecting Page State
Screenshots
uv run skills/browser/client.py screenshot main screenshot.png
uv run skills/browser/client.py screenshot main full.png --full-page # Capture entire scrollable page
ARIA Snapshot (Element Discovery)
Use snapshot to discover page elements. Returns YAML-formatted accessibility tree:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- main:
- heading "Products" [ref=e3] [level=1]
- list:
- listitem:
- link "Article Title" [ref=e4]
- button "Add to Cart" [ref=e5]
- listitem:
- link "Another Article" [ref=e6]
- button "Add to Cart" [ref=e7] [nth=1]
- contentinfo:
- textbox [ref=e8]
- /placeholder: "Search"
Interpreting refs:
[ref=eN]– Element reference for interaction[nth=N]– Nth duplicate element with same role+name (0-indexed, first one omitted)[checked],[disabled],[expanded]– Element states[level=N]– Heading level/url:,/placeholder:– Element properties
Interacting with refs:
# Get snapshot to find refs
uv run skills/browser/client.py snapshot main
# Only show interactive elements (buttons, links, inputs, etc.)
uv run skills/browser/client.py snapshot main -i
# Use ref to interact
uv run skills/browser/client.py select-ref main e2 click
uv run skills/browser/client.py select-ref main e7 click # Click second "Add to Cart"
uv run skills/browser/client.py select-ref main e8 fill "search term"
Error Recovery
Page state persists after failures. Debug with:
# Take screenshot to see current state
uv run skills/browser/client.py screenshot main debug.png
# Get page info
uv run skills/browser/client.py info main
# Get text content
uv run skills/browser/client.py text main "body"
Command Reference
Page Management
uv run skills/browser/client.py list # List all pages
uv run skills/browser/client.py create main # Create a new page
uv run skills/browser/client.py create main "https://..." # Create and navigate
uv run skills/browser/client.py goto main "https://..." # Navigate existing page
uv run skills/browser/client.py close main # Close a page
uv run skills/browser/client.py info main # Get page URL and title
uv run skills/browser/client.py release main # Release page (call when done!)
Element Interaction
uv run skills/browser/client.py click main "button.submit" # Click element
uv run skills/browser/client.py fill main "input#email" "test@example.com" # Fill input
uv run skills/browser/client.py hover main ".dropdown" # Hover over element
uv run skills/browser/client.py keyboard main "Enter" # Press key
uv run skills/browser/client.py text main "h1" # Get element text
JavaScript Execution
uv run skills/browser/client.py evaluate main "document.title"
uv run skills/browser/client.py evaluate main "document.querySelectorAll('.item').length"
Python Script (Advanced)
For complex tasks requiring loops or page.on() event handlers, use heredoc with BrowserClient:
cd skills/browser && uv run python <<'EOF'
from client import BrowserClient
client = BrowserClient()
page = client.get_playwright_page("main")
# Full Playwright API available
page.goto("https://example.com")
page.click("button")
# Event handlers for request interception
page.on("response", lambda r: print(r.url))
EOF
The page object is a standard Playwright Page.
Important: Release Page After All Operations
â ï¸ CRITICAL: Call
releasecommand when you finish ALL operations on a page!
When browser skill operates on a page, Max UI shows an “Agent Operating” indicator and locks user interaction. You must call release when your entire operation sequence is complete:
# Example workflow
uv run skills/browser/client.py create main "https://example.com"
uv run skills/browser/client.py wait-load main
uv run skills/browser/client.py snapshot main
uv run skills/browser/client.py click main "button.submit"
uv run skills/browser/client.py wait-load main
# When ALL operations are done, release the page
uv run skills/browser/client.py release main
Why this matters:
- Without
release, the indicator stays for 30 seconds (timeout fallback) - User interaction is blocked during operations
- Calling
releaseimmediately unlocks user control
When to call release:
- After completing a task (e.g., filled a form, extracted data)
- Before reporting results to the user
- NOT after each individual command – only when the entire operation sequence is done
For Python scripts:
from client import BrowserClient
client = BrowserClient()
try:
page = client.get_playwright_page("main")
# ... perform ALL operations ...
finally:
client.release_page("main") # REQUIRED: Clear indicator
client.disconnect()