remote-browser
npx skills add https://github.com/shawnpana/browser-use --skill remote-browser
Agent 安装分布
Skill 文档
Remote Browser Automation for Sandboxed Agents
This skill is for agents running on sandboxed remote machines (cloud VMs, CI, coding agents) that need to control a browser. Install browser-use and drive a cloud browser â no local Chrome needed.
Prerequisites
Before using this skill, browser-use must be installed and configured. Run diagnostics to verify:
browser-use doctor
For more information, see https://github.com/browser-use/browser-use/blob/main/browser_use/skill_cli/README.md
Core Workflow
Commands use the cloud browser:
# Step 1: Start session (automatically uses remote mode)
browser-use open https://example.com
# Returns: url, live_url (view the browser in real-time)
# Step 2+: All subsequent commands use the existing session
browser-use state # Get page elements with indices
browser-use click 5 # Click element by index
browser-use type "Hello World" # Type into focused element
browser-use input 3 "text" # Click element, then type
browser-use screenshot # Take screenshot (base64)
browser-use screenshot page.png # Save screenshot to file
# Done: Close the session
browser-use close # Close browser and release resources
Essential Commands
# Navigation
browser-use open <url> # Navigate to URL
browser-use back # Go back
browser-use scroll down # Scroll down (--amount N for pixels)
# Page State (always run state first to get element indices)
browser-use state # Get URL, title, clickable elements
browser-use screenshot # Take screenshot (base64)
browser-use screenshot path.png # Save screenshot to file
# Interactions (use indices from state)
browser-use click <index> # Click element
browser-use type "text" # Type into focused element
browser-use input <index> "text" # Click element, then type
browser-use keys "Enter" # Send keyboard keys
browser-use select <index> "option" # Select dropdown option
# Data Extraction
browser-use eval "document.title" # Execute JavaScript
browser-use get text <index> # Get element text
browser-use get html --selector "h1" # Get scoped HTML
# Wait
browser-use wait selector "h1" # Wait for element
browser-use wait text "Success" # Wait for text
# Session
browser-use close # Close browser session
# AI Agent
browser-use run "task" # Run agent (async by default)
browser-use task status <id> # Check task progress
Commands
Navigation & Tabs
browser-use open <url> # Navigate to URL
browser-use back # Go back in history
browser-use scroll down # Scroll down
browser-use scroll up # Scroll up
browser-use scroll down --amount 1000 # Scroll by specific pixels (default: 500)
browser-use switch <tab> # Switch tab by index
browser-use close-tab # Close current tab
browser-use close-tab <tab> # Close specific tab
Page State
browser-use state # Get URL, title, and clickable elements
browser-use screenshot # Take screenshot (base64)
browser-use screenshot path.png # Save screenshot to file
browser-use screenshot --full p.png # Full page screenshot
Interactions
browser-use click <index> # Click element
browser-use type "text" # Type into focused element
browser-use input <index> "text" # Click element, then type
browser-use keys "Enter" # Send keyboard keys
browser-use keys "Control+a" # Key combination
browser-use select <index> "option" # Select dropdown option
browser-use hover <index> # Hover over element
browser-use dblclick <index> # Double-click
browser-use rightclick <index> # Right-click
Use indices from browser-use state.
JavaScript & Data
browser-use eval "document.title" # Execute JavaScript
browser-use get title # Get page title
browser-use get html # Get page HTML
browser-use get html --selector "h1" # Scoped HTML
browser-use get text <index> # Get element text
browser-use get value <index> # Get input value
browser-use get attributes <index> # Get element attributes
browser-use get bbox <index> # Get bounding box (x, y, width, height)
Cookies
browser-use cookies get # Get all cookies
browser-use cookies get --url <url> # Get cookies for specific URL
browser-use cookies set <name> <val> # Set a cookie
browser-use cookies set name val --domain .example.com --secure
browser-use cookies set name val --same-site Strict # SameSite: Strict, Lax, None
browser-use cookies set name val --expires 1735689600 # Expiration timestamp
browser-use cookies clear # Clear all cookies
browser-use cookies clear --url <url> # Clear cookies for specific URL
browser-use cookies export <file> # Export to JSON
browser-use cookies import <file> # Import from JSON
Wait Conditions
browser-use wait selector "h1" # Wait for element
browser-use wait selector ".loading" --state hidden # Wait for element to disappear
browser-use wait text "Success" # Wait for text
browser-use wait selector "#btn" --timeout 5000 # Custom timeout (ms)
Python Execution
browser-use python "x = 42" # Set variable
browser-use python "print(x)" # Access variable (prints: 42)
browser-use python "print(browser.url)" # Access browser object
browser-use python --vars # Show defined variables
browser-use python --reset # Clear namespace
browser-use python --file script.py # Run Python file
The Python session maintains state across commands. The browser object provides:
browser.url,browser.title,browser.htmlâ page infobrowser.goto(url),browser.back()â navigationbrowser.click(index),browser.type(text),browser.input(index, text),browser.keys(keys)â interactionsbrowser.screenshot(path),browser.scroll(direction, amount)â visualbrowser.wait(seconds),browser.extract(query)â utilities
Agent Tasks
browser-use run "Fill the contact form with test data" # AI agent
browser-use run "Extract all product prices" --max-steps 50
# Specify LLM model
browser-use run "task" --llm gpt-4o
browser-use run "task" --llm claude-sonnet-4-20250514
# Proxy configuration (default: us)
browser-use run "task" --proxy-country uk
# Session reuse
browser-use run "task 1" --keep-alive # Keep session alive after task
browser-use run "task 2" --session-id abc-123 # Reuse existing session
# Execution modes
browser-use run "task" --flash # Fast execution mode
browser-use run "task" --wait # Wait for completion (default: async)
# Advanced options
browser-use run "task" --thinking # Extended reasoning mode
browser-use run "task" --no-vision # Disable vision (enabled by default)
# Using a cloud profile (create session first, then run with --session-id)
browser-use session create --profile <cloud-profile-id> --keep-alive
# â returns session_id
browser-use run "task" --session-id <session-id>
# Task configuration
browser-use run "task" --start-url https://example.com # Start from specific URL
browser-use run "task" --allowed-domain example.com # Restrict navigation (repeatable)
browser-use run "task" --metadata key=value # Task metadata (repeatable)
browser-use run "task" --skill-id skill-123 # Enable skills (repeatable)
browser-use run "task" --secret key=value # Secret metadata (repeatable)
# Structured output and evaluation
browser-use run "task" --structured-output '{"type":"object"}' # JSON schema for output
browser-use run "task" --judge # Enable judge mode
browser-use run "task" --judge-ground-truth "answer"
Task Management
browser-use task list # List recent tasks
browser-use task list --limit 20 # Show more tasks
browser-use task list --status finished # Filter by status (finished, stopped)
browser-use task list --session <id> # Filter by session ID
browser-use task list --json # JSON output
browser-use task status <task-id> # Get task status (latest step only)
browser-use task status <task-id> -c # All steps with reasoning
browser-use task status <task-id> -v # All steps with URLs + actions
browser-use task status <task-id> --last 5 # Last N steps only
browser-use task status <task-id> --step 3 # Specific step number
browser-use task status <task-id> --reverse # Newest first
browser-use task stop <task-id> # Stop a running task
browser-use task logs <task-id> # Get task execution logs
Cloud Session Management
browser-use session list # List cloud sessions
browser-use session list --limit 20 # Show more sessions
browser-use session list --status active # Filter by status
browser-use session list --json # JSON output
browser-use session get <session-id> # Get session details + live URL
browser-use session get <session-id> --json
browser-use session stop <session-id> # Stop a session
browser-use session stop --all # Stop all active sessions
browser-use session create # Create with defaults
browser-use session create --profile <id> # With cloud profile
browser-use session create --proxy-country uk # With geographic proxy
browser-use session create --start-url https://example.com
browser-use session create --screen-size 1920x1080
browser-use session create --keep-alive
browser-use session create --persist-memory
browser-use session share <session-id> # Create public share URL
browser-use session share <session-id> --delete # Delete public share
Cloud Profile Management
browser-use profile list # List cloud profiles
browser-use profile list --page 2 --page-size 50
browser-use profile get <id> # Get profile details
browser-use profile create # Create new profile
browser-use profile create --name "My Profile"
browser-use profile update <id> --name "New Name"
browser-use profile delete <id>
Tunnels
browser-use tunnel <port> # Start tunnel (returns URL)
browser-use tunnel <port> # Idempotent - returns existing URL
browser-use tunnel list # Show active tunnels
browser-use tunnel stop <port> # Stop tunnel
browser-use tunnel stop --all # Stop all tunnels
Session Management
browser-use sessions # List active sessions
browser-use close # Close current session
browser-use close --all # Close all sessions
Common Workflows
Exposing Local Dev Servers
Use when you have a dev server on the remote machine and need the cloud browser to reach it.
Core workflow: Start dev server â create tunnel â browse the tunnel URL.
# 1. Start your dev server
python -m http.server 3000 &
# 2. Expose it via Cloudflare tunnel
browser-use tunnel 3000
# â url: https://abc.trycloudflare.com
# 3. Now the cloud browser can reach your local server
browser-use open https://abc.trycloudflare.com
browser-use state
browser-use screenshot
Note: Tunnels are independent of browser sessions. They persist across browser-use close and can be managed separately. Cloudflared must be installed â run browser-use doctor to check.
Running Subagents
Use cloud sessions to run autonomous browser agents in parallel.
Core workflow: Launch task(s) with run â poll with task status â collect results â clean up sessions.
- Session = Agent: Each cloud session is a browser agent with its own state
- Task = Work: Jobs given to an agent; an agent can run multiple tasks sequentially
- Session lifecycle: Once stopped, a session cannot be revived â start a new one
Launching Tasks
# Single task (async by default â returns immediately)
browser-use run "Search for AI news and summarize top 3 articles"
# â task_id: task-abc, session_id: sess-123
# Parallel tasks â each gets its own session
browser-use run "Research competitor A pricing"
# â task_id: task-1, session_id: sess-a
browser-use run "Research competitor B pricing"
# â task_id: task-2, session_id: sess-b
browser-use run "Research competitor C pricing"
# â task_id: task-3, session_id: sess-c
# Sequential tasks in same session (reuses cookies, login state, etc.)
browser-use run "Log into example.com" --keep-alive
# â task_id: task-1, session_id: sess-123
browser-use task status task-1 # Wait for completion
browser-use run "Export settings" --session-id sess-123
# â task_id: task-2, session_id: sess-123 (same session)
Managing & Stopping
browser-use task list --status finished # See completed tasks
browser-use task stop task-abc # Stop a task (session may continue if --keep-alive)
browser-use session stop sess-123 # Stop an entire session (terminates its tasks)
browser-use session stop --all # Stop all sessions
Monitoring
Task status is designed for token efficiency. Default output is minimal â only expand when needed:
| Mode | Flag | Tokens | Use When |
|---|---|---|---|
| Default | (none) | Low | Polling progress |
| Compact | -c |
Medium | Need full reasoning |
| Verbose | -v |
High | Debugging actions |
# For long tasks (50+ steps)
browser-use task status <id> -c --last 5 # Last 5 steps only
browser-use task status <id> -v --step 10 # Inspect specific step
Live view: browser-use session get <session-id> returns a live URL to watch the agent.
Detect stuck tasks: If cost/duration in task status stops increasing, the task is stuck â stop it and start a new agent.
Logs: browser-use task logs <task-id> â only available after task completes.
Global Options
| Option | Description |
|---|---|
--session NAME |
Named session (default: “default”) |
--browser MODE |
Browser mode (only if multiple modes installed) |
--profile ID |
Cloud profile ID for persistent cookies. Works with open, session create, etc. â does NOT work with run (use --session-id instead) |
--json |
Output as JSON |
Tips
- Run
browser-use doctorto verify installation before starting - Always run
statefirst to see available elements and their indices - Sessions persist across commands â the browser stays open until you close it
- Tunnels are independent â they persist across
browser-use close - Use
--jsonfor programmatic parsing tunnelis idempotent â calling it again for the same port returns the existing URL
Troubleshooting
“Browser mode ‘chromium’ not installed”?
- Expected for sandboxed agents â remote mode only supports cloud browsers
- Run
browser-use doctorto verify configuration
Cloud browser won’t start?
- Run
browser-use doctorto check configuration
Tunnel not working?
- Verify cloudflared is installed:
which cloudflared browser-use tunnel listto check active tunnelsbrowser-use tunnel stop <port>and retry
Element not found?
- Run
browser-use stateto see current elements browser-use scroll downthenbrowser-use stateâ element might be below fold
Session reuse fails after task stop:
Create a new session instead:
browser-use session create --profile <profile-id> --keep-alive
browser-use run "new task" --session-id <new-session-id>
Task stuck at “started”: Check cost with task status â if not increasing, the task is stuck. View live URL with session get, then stop and start a new agent.
Sessions persist after tasks complete: Run browser-use session stop --all to clean up.
Cleanup
Always close resources when done:
browser-use close # Close browser session
browser-use session stop --all # Stop cloud sessions (if any)
browser-use tunnel stop --all # Stop tunnels (if any)