ai-vision
npx skills add https://github.com/httprunner/skills --skill ai-vision
Agent 安装分布
Skill 文档
AI Vision
Overview
This skill provides a standalone CLI to call multimodal models for UI querying, assertion, and single-step planning. It does not depend on device type; you supply a screenshot and receive structured output (coordinates, decisions, or next actions). Execution and multi-step loops are handled externally by agents using adb/hdc or other drivers. Prefer storing screenshots in ~/.eval/screenshots/ and add timestamps to avoid overwriting.
Path Convention
Canonical install and execution directory: ~/.agents/skills/ai-vision/. Run commands from this directory:
cd ~/.agents/skills/ai-vision
One-off (safe in scripts/loops from any working directory):
(cd ~/.agents/skills/ai-vision && npx tsx scripts/ai_vision.ts --help)
Model Configuration
Default Doubao configuration via environment variables:
ARK_BASE_URL(e.g.https://ark.cn-beijing.volces.com/api/v3)ARK_API_KEYARK_MODEL_NAME
For non-Doubao providers, pass explicit flags:
--base-url,--api-key,--model
Default model if none provided: doubao-seed-1-6-vision-250815.
Script
Path: scripts/ai_vision.ts
Run with:
npx tsx scripts/ai_vision.ts --help
Log level (for troubleshooting raw model response):
npx tsx scripts/ai_vision.ts --log-level debug <command> [flags]
Output formatting:
- When
--log-jsonis set, logs are emitted as JSON. - Otherwise, the final result is pretty-printed JSON, and logs are colorized when TTY is available.
AIQuery
npx tsx scripts/ai_vision.ts query \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "请è¯å«å±å¹ä¸çâæç´¢âæé®ï¼å¹¶è¿åå
¶åæ "
AIAssert
npx tsx scripts/ai_vision.ts assert \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "å½å页é¢å
å«æç´¢æ¡"
plan-next (single-step planning)
npx tsx scripts/ai_vision.ts plan-next \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "ç¹å»æ¾å¤§é徿 è¿å
¥æç´¢é¡µ"
Output Notes
plan-nextreturns a normalized next action with absolute pixel coordinates.- If the model outputs relative coordinates (1000×1000), the script scales to screen pixels.
- Combine with adb/hdc actions (e.g.,
adb shell input tap X Y) for device control. - Use
--log-level debugto print the raw model response for troubleshooting.
Default Models (Doubao)
doubao-seed-1-8-251228doubao-seed-1-6-vision-250815
References
references/doubao-api.md