macpilot-screenshot-ocr

📁 adhikjoshi/macpilot-skills 📅 9 days ago
3
总安装量
2
周安装量
#61812
全站排名
安装命令
npx skills add https://github.com/adhikjoshi/macpilot-skills --skill macpilot-screenshot-ocr

Agent 安装分布

openclaw 2
gemini-cli 2
claude-code 2
github-copilot 2
codex 2
droid 2

Skill 文档

MacPilot Screenshot & OCR

Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple’s built-in Vision OCR.

When to Use

Use this skill when:

  • You need to capture what’s currently on screen
  • You need to extract text from an image file
  • You need to read text from a specific area of the screen
  • You need to capture a specific app window
  • You need to verify visual state of an application
  • You need to capture screen recordings

Screenshot Commands

Full Screen

macpilot screenshot --json                           # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json      # Capture to specific path

Specific Region

macpilot screenshot --region 100,200,800,600 --json
# Region format: x,y,width,height (from top-left corner)

Specific Window

macpilot screenshot --window "Safari" --json         # Capture Safari window
macpilot screenshot --window "Finder" --json         # Capture Finder window

All Windows

macpilot screenshot --all-windows --json             # Each window separately

Specific Display

macpilot screenshot --display 1 --json               # Second display (0-indexed)

Format Options

macpilot screenshot --format png ~/Desktop/shot.png  # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg  # JPEG (smaller files)

OCR Commands

Extract Text from Image File

macpilot ocr /path/to/image.png --json
macpilot ocr ~/Desktop/screenshot.png --json

Extract Text from Screen Region

macpilot ocr 100 200 800 600 --json
# Arguments: x y width height (captures region then OCRs it)

Multi-Language OCR

macpilot ocr image.png --language en-US --json       # English
macpilot ocr image.png --language ja --json           # Japanese
macpilot ocr image.png --language zh-Hans --json      # Simplified Chinese
macpilot ocr image.png --language de --json           # German
macpilot ocr image.png --language fr --json           # French

Screen Recording

Record Screen

macpilot screen record start --output ~/Desktop/recording.mov --json
# ... perform actions ...
macpilot screen record stop --json

Display Information

macpilot display-info --json
# Returns: all displays with resolution, position, scale factor

Workflow Patterns

Capture and OCR in One Flow

# Take screenshot of specific region
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
# Extract text from it
macpilot ocr ~/tmp/capture.png --json

Quick Screen Region OCR

# Directly OCR a screen region without saving
macpilot ocr 200 100 600 400 --json

Verify UI State

# Screenshot a window to see its current state
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
# Read the image to verify content
macpilot ocr ~/tmp/safari.png --json

Record an Automation

macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stop

Tips

  • Screen Recording permission must be granted to MacPilot.app in System Settings
  • PNG format is best for screenshots with text (lossless); JPEG for photos
  • OCR works best on high-contrast text; increase screenshot region size if text is small
  • Use display-info to get screen dimensions before capturing specific regions
  • The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
  • On Retina displays, coordinates are in logical points (not physical pixels)