agent-rdp
26
总安装量
26
周安装量
#7647
全站排名
安装命令
npx skills add https://github.com/thisnick/agent-rdp --skill agent-rdp
Agent 安装分布
claude-code
19
opencode
18
gemini-cli
16
antigravity
15
github-copilot
9
codex
9
Skill 文档
Windows Remote Desktop Control with agent-rdp
Quick start
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i # See interactive elements
agent-rdp automate click "@e5" # Click button by ref
agent-rdp automate fill "@e7" "Hello" # Type into field
agent-rdp disconnect
Core workflow
- Connect with automation:
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation - Snapshot:
agent-rdp automate snapshot -i(get accessibility tree with refs) - Act:
agent-rdp automate click @e5oragent-rdp automate fill @e7 "text" - Repeat: snapshot â act â snapshot â act…
Troubleshooting
Element not in snapshot with -i
Try without -i flag – some elements aren’t marked as interactive but are still actionable:
agent-rdp automate snapshot # Full tree, no filtering
agent-rdp automate snapshot -d 5 # Limit depth if too large
Element not in accessibility tree at all
Some UI elements (WebView content, certain dialogs, toast notifications) don’t appear in the accessibility tree. Use OCR as a last resort:
- Take screenshot to identify what you need:
agent-rdp screenshot -o screen.png - Use locate to find coordinates:
agent-rdp locate "Button Text" - Click using returned coordinates:
agent-rdp mouse click <x> <y>
Commands
Connection
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin # Read password from stdin
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share # Map local directory
agent-rdp disconnect
Screenshot
agent-rdp screenshot # Save to ./screenshot.png
agent-rdp screenshot -o desktop.png # Save to specific file
agent-rdp screenshot --format jpeg # JPEG format
Mouse
agent-rdp mouse click 500 300 # Left click at (500, 300)
agent-rdp mouse right-click 500 300 # Right click
agent-rdp mouse double-click 500 300 # Double click
agent-rdp mouse move 100 200 # Move cursor
agent-rdp mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)
Keyboard
agent-rdp keyboard type "Hello World" # Type text (supports Unicode)
agent-rdp keyboard press "ctrl+c" # Key combination
agent-rdp keyboard press "alt+tab" # Switch windows
agent-rdp keyboard press "ctrl+shift+esc" # Task manager
agent-rdp keyboard press "win+r" # Run dialog
agent-rdp keyboard press enter # Single key (use press, not key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5
Scroll
agent-rdp scroll up --amount 3 # Scroll up 3 notches
agent-rdp scroll down --amount 5 # Scroll down 5 notches
agent-rdp scroll left
agent-rdp scroll right
Clipboard
agent-rdp clipboard set "Text to paste" # Set clipboard (paste on Windows)
agent-rdp clipboard get # Get clipboard (after copy on Windows)
Drive mapping
# Map at connect time
agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName
# List mapped drives
agent-rdp drive list
Session management
agent-rdp session list # List active sessions
agent-rdp session info # Current session info
agent-rdp --session work connect ... # Named session
agent-rdp --session work screenshot # Use named session
Wait
agent-rdp wait 2000 # Wait 2 seconds
Locate (OCR)
agent-rdp locate "Cancel" # Find lines containing "Cancel"
agent-rdp locate "Save*" --pattern # Glob pattern matching
agent-rdp locate --all # Get all text on screen
agent-rdp locate "OK" --json # JSON output with coordinates
Returns text lines with bounding boxes and center coordinates for clicking:
Found 1 line(s) containing 'Cancel':
'Cancel' at (650, 420) size 45x14 - center: (672, 427)
To click the first match: agent-rdp mouse click 672 427
UI Automation
# Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
# Snapshot - get accessibility tree (refs always included)
agent-rdp automate snapshot # Full desktop tree
agent-rdp automate snapshot -i # Interactive elements only
agent-rdp automate snapshot -c # Compact (remove empty elements)
agent-rdp automate snapshot -d 5 # Limit depth to 5 levels
agent-rdp automate snapshot -s "~*Notepad*"# Scope to a window/element
agent-rdp automate snapshot -f # Start from focused element
agent-rdp automate snapshot -i -c -d 3 # Combine options
# Pattern-based element operations (use selectors: @eN, #automationId, .className, or name)
agent-rdp automate click "#SaveButton" # Click button
agent-rdp automate click "@e5" # Click by ref number
agent-rdp automate click "@e5" -d # Double-click (for file list items)
agent-rdp automate select "@e10" # Select item (SelectionItemPattern)
agent-rdp automate select "@e5" --item "Option 1" # Select item by name in container
agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern)
agent-rdp automate toggle "@e7" --state on # Set specific state
agent-rdp automate expand "@e3" # Expand menu/tree (ExpandCollapsePattern)
agent-rdp automate collapse "@e3" # Collapse menu/tree
agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10)
agent-rdp automate focus <selector> # Focus element
agent-rdp automate get <selector> # Get element properties
# Text input
agent-rdp automate fill <selector> "text" # Clear and fill text (ValuePattern)
agent-rdp automate clear <selector> # Just clear
# Scrolling
agent-rdp automate scroll <selector> --direction down --amount 3
# Window operations
agent-rdp automate window list
agent-rdp automate window focus "~*Notepad*"
agent-rdp automate window maximize
agent-rdp automate window minimize
agent-rdp automate window restore
agent-rdp automate window close "~*Notepad*"
# Run commands/apps (best way to open apps)
agent-rdp automate run "notepad.exe" # Open Notepad
agent-rdp automate run "Start-Process ms-settings:" --wait # Open Settings
agent-rdp automate run "calc.exe" # Open Calculator
agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout
# Wait for element
agent-rdp automate wait-for <selector> --timeout 5000
agent-rdp automate wait-for <selector> --state visible
# Status
agent-rdp automate status
Selector syntax:
@e5or@5– Reference number from snapshot (e prefix recommended)#SaveButton– Automation ID.Edit– Win32 class name~*pattern*– Name with wildcardFile– Element name (exact match)
Snapshot output format:
- Window "Notepad" [ref=e1, id=Notepad]
- MenuBar "Application" [ref=e2]
- MenuItem "File" [ref=e3]
- Edit "Text Editor" [ref=e5, value="Hello"]
JSON output
Add --json for machine-readable output:
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshot
Example: Open PowerShell and run command
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000 # Wait for desktop
agent-rdp keyboard press "win+r" # Open Run dialog
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000 # Wait for PowerShell
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press enter
agent-rdp screenshot --output result.png
agent-rdp disconnect
Example: File transfer via mapped drive
# Connect with local directory mapped
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --drive /tmp/transfer:Transfer
# On Windows, access files at \\tsclient\Transfer
agent-rdp keyboard press "win+r"
agent-rdp wait 500
agent-rdp keyboard type "\\\\tsclient\\Transfer"
agent-rdp keyboard press enter
Example: Automate Notepad with UI Automation
# Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
# Open Notepad
agent-rdp automate run "notepad.exe"
agent-rdp wait 2000
# Get accessibility snapshot (refs are always included)
agent-rdp automate snapshot -i # Interactive elements only
# Type text into the edit control (use ref from snapshot)
agent-rdp automate fill "@e5" "Hello from automation!"
# Use File menu to save - expand menu, then invoke menu item
agent-rdp automate expand "File" # Expand menu (ExpandCollapsePattern)
agent-rdp wait 500
agent-rdp automate click "Save As..." # Click menu item
# Wait for Save dialog
agent-rdp automate wait-for "#FileNameControlHost" --timeout 5000
# Fill filename and save
agent-rdp automate fill "#FileNameControlHost" "test.txt"
agent-rdp automate click "#1" # Click Save button
Environment variables
export AGENT_RDP_HOST=192.168.1.100
export AGENT_RDP_PORT=3389
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
export AGENT_RDP_SESSION=default
agent-rdp connect # Uses env vars for connection
Debugging with WebSocket streaming
# Enable streaming viewer on port 9224
agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
# Open web viewer in browser
agent-rdp view --port 9224
# Or manually access WebSocket at ws://localhost:9224 (broadcasts JPEG frames)
Tips
Prefer automate fill over keyboard type when automation is enabledâit’s lossless (no dropped characters) and faster.
Opening applications
Use automate run to launch apps directly:
agent-rdp automate run "notepad.exe"
agent-rdp automate run "calc.exe"
agent-rdp automate run "Start-Process ms-settings:" --wait # Settings
agent-rdp automate run "explorer.exe C:\\" # File Explorer
Limitations
IMPORTANT: Read these limitations carefully before attempting automation tasks.
UI Automation cannot access WebViews
- The Windows Start menu search, Edge browser content, Electron app content, and other WebView-based UIs are NOT accessible via
automate snapshot. - Workaround: Use
Win+R(Run dialog) orautomate runto launch programs directly instead of navigating through the Start menu.
UI Automation cannot handle UAC dialogs
- User Account Control elevation prompts run on a secure desktop isolated from UI Automation.
- UAC dialogs will NOT appear in
automate snapshotoutput. - Workaround: Use
locatecommand (OCR) to find button text andmouse clickto interact. This is unreliable but may work for simple Yes/No dialogs.
OCR (locate) is not highly reliable
- The
locatecommand uses OCR which can misread characters, miss text entirely, or return imprecise coordinates. - Use it as a last resort when UI Automation cannot access elements.
- Always verify coordinates before clicking critical buttons.
DO NOT estimate coordinates from screenshots (Claude only)
- Claude models in non-computer-use mode (like Claude Code) are very bad at pixel counting.
- Do NOT look at a screenshot and try to guess coordinates – the estimates will likely be wrong.
- Note: Gemini models are generally good at pixel coordinate estimation.
- If you need vision-based coordinate detection with Claude, the user must implement a harness using Claude’s Computer Use Tool.
Recommended workflow when UI Automation fails
- First, always try
automate snapshot(with and without-iflag) - If element not found, try
locate "text"to find via OCR - Use coordinates from
locateoutput withmouse click - Never estimate coordinates by looking at screenshots