chrome bridge automation
npx skills add https://github.com/web-infra-dev/midscene-skills --skill Chrome Bridge Automation
Skill 文档
Chrome Bridge Automation
CRITICAL RULES â VIOLATIONS WILL BREAK THE WORKFLOW:
- NEVER set
run_in_background: trueon any Bash tool call for midscene commands. Everynpx @midscene/webcommand MUST userun_in_background: false(or omit the parameter entirely). Background execution causes notification spam after the task ends and breaks the screenshot-analyze-act loop.- Send only ONE midscene CLI command per Bash tool call. Wait for its result, read the screenshot, then decide the next action. Do NOT chain commands with
&&,;, orsleep.- Set
timeout: 60000(60 seconds) on each Bash tool call to allow sufficient time for midscene commands to complete synchronously.
Automate the user’s real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots.
Command Format
CRITICAL â Every command MUST follow this EXACT format. Do NOT modify the command prefix.
npx @midscene/web --bridge <subcommand> [args]
--bridgeflag is MANDATORY â it activates Bridge mode to connect to the user’s real Chrome- Without
--bridge, the CLI launches a separate headless browser (wrong behavior for this skill) - Do NOT use
-pflag, do NOT use environment variables as substitutes â use--bridgeexactly as shown
Prerequisites
The user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status â just connect directly.
The CLI automatically loads .env from the current working directory. Before first use, verify the .env file exists and contains the API key:
cat .env | grep MIDSCENE_MODEL_API_KEY | head -c 30
If no .env file or no API key, ask the user to create one. See Model Configuration for supported providers.
Do NOT run echo $MIDSCENE_MODEL_API_KEY â the key is loaded from .env at runtime, not from shell environment.
Commands
Connect to a Web Page
npx @midscene/web --bridge connect --url https://example.com
Take Screenshot
npx @midscene/web --bridge take_screenshot
After taking a screenshot, read the saved image file to understand the current page state before deciding the next action.
Perform Actions
npx @midscene/web --bridge Tap --locate '{"prompt":"the Login button"}'
npx @midscene/web --bridge Input --locate '{"prompt":"the email field"}' --value 'user@example.com'
npx @midscene/web --bridge Scroll --direction down
npx @midscene/web --bridge Hover --locate '{"prompt":"the navigation menu"}'
npx @midscene/web --bridge KeyboardPress --value Enter
npx @midscene/web --bridge DragAndDrop --locate '{"prompt":"the draggable item"}' --target '{"prompt":"the drop zone"}'
Natural Language Action
Use act to execute multi-step operations in a single command â useful for transient UI interactions:
npx @midscene/web --bridge act --prompt "click the country dropdown and select Japan"
Disconnect
npx @midscene/web --bridge disconnect
Workflow Pattern
Since CLI commands are stateless between invocations, follow this pattern:
- Connect to a URL to establish a session
- Take screenshot to see the current state
- Analyze the screenshot to decide the next action
- Execute action (Tap, Input, Scroll, etc.)
- Take screenshot again to verify the result
- Repeat steps 3-5 until the task is complete
- Disconnect when done
Best Practices
- Always connect first: Navigate to the target URL with
connect --urlbefore any interaction. - Take screenshots frequently: Before and after each action to verify state changes.
- Be specific in locate prompts: Instead of
"the button", say"the blue Submit button in the contact form". - Use natural language: Describe what you see on the page, not CSS selectors. Say
"the red Buy Now button"instead of"#buy-btn". - Handle loading states: After navigation or actions that trigger page loads, take a screenshot to verify the page has loaded.
- Disconnect when done: Always disconnect to free resources.
- Never run in background: On every Bash tool call, either omit
run_in_backgroundor explicitly set it tofalse. Never setrun_in_background: true.
Handle Transient UI
Dropdowns, autocomplete popups, tooltips, and confirm dialogs disappear between commands. When interacting with transient UI:
- Use
actfor multi-step transient interactions â it executes everything in a single process - Or execute commands rapidly in sequence â do NOT take screenshots between steps
- Do NOT pause to analyze â run all commands for the transient interaction back-to-back
- Persistent UI (page content, navigation bars, sidebars) is fine to interact with across separate commands
Example â Dropdown selection using act (recommended for transient UI):
npx @midscene/web --bridge act --prompt "click the country dropdown and select Japan"
npx @midscene/web --bridge take_screenshot
Example â Dropdown selection using individual commands (alternative):
# These commands must be run back-to-back WITHOUT screenshots in between
npx @midscene/web --bridge Tap --locate '{"prompt":"the country dropdown"}'
npx @midscene/web --bridge Tap --locate '{"prompt":"Japan option in the dropdown list"}'
# NOW take a screenshot to verify the result
npx @midscene/web --bridge take_screenshot
Common Patterns
Simple Browsing
npx @midscene/web --bridge connect --url 'https://news.ycombinator.com'
npx @midscene/web --bridge take_screenshot
# Read the screenshot, then decide next action
npx @midscene/web --bridge disconnect
Multi-Step Interaction
npx @midscene/web --bridge connect --url 'https://example.com'
npx @midscene/web --bridge Tap --locate '{"prompt":"the Sign In link"}'
npx @midscene/web --bridge take_screenshot
npx @midscene/web --bridge Input --locate '{"prompt":"the email field"}' --value 'user@example.com'
npx @midscene/web --bridge Input --locate '{"prompt":"the password field"}' --value 'password123'
npx @midscene/web --bridge Tap --locate '{"prompt":"the Log In button"}'
npx @midscene/web --bridge take_screenshot
npx @midscene/web --bridge disconnect
Troubleshooting
Bridge Mode Connection Failures
- Ensure Chrome is open with the Midscene Extension installed and enabled.
- Check that the extension shows “Connected” status.
- See the Bridge Mode documentation.
API Key Errors
- Check
.envfile containsMIDSCENE_MODEL_API_KEY=<your-key>. - Verify the key is valid for the configured model provider.
Timeouts
- Web pages may take time to load. After connecting, take a screenshot to verify readiness before interacting.
- For slow pages, wait briefly between steps.
Screenshots Not Displaying
- The screenshot path is an absolute path to a local file. Use the Read tool to view it.