media-understand
13
总安装量
11
周安装量
#24387
全站排名
安装命令
npx skills add https://github.com/infquest/vibe-ops-plugin --skill media-understand
Agent 安装分布
claude-code
8
opencode
8
codex
7
gemini-cli
7
trae
6
cursor
6
Skill 文档
Media Understanding
ä½¿ç¨ Gemini 2.5 Flash åæåçè§£å¤åªä½å 容ã
Supported Formats
| Type | Formats | Max Size |
|---|---|---|
| Image | jpg, jpeg, png, gif, webp | 20MB |
| Video | mp4, mpeg, mov, webm, YouTube URL | 100MB |
| Audio | wav, mp3, aiff, aac, ogg, flac, m4a | 100MB |
Prerequisites
MAX_API_KEYç¯å¢åéï¼Max èªå¨æ³¨å ¥ï¼- Bun 1.0+ï¼Max v0.0.27+ å ç½®ï¼æ éé¢å¤å®è£ ï¼
Usage
bun skills/media-understand/media-understand.js <media_path_or_url> [prompt] [language]
Arguments:
media_path_or_url: File path or YouTube URLprompt: Question or analysis request (default: “Please describe this content”)language: Output language –chineseorenglish(default: chinese)
Examples
Image Analysis
# Describe image
bun skills/media-understand/media-understand.js ./photo.jpg "请æè¿°è¿å¼ å¾ç" chinese
# OCR - Extract text
bun skills/media-understand/media-understand.js ./screenshot.png "è¯å«å¾çä¸çæææå" chinese
# Answer question about image
bun skills/media-understand/media-understand.js ./chart.png "è¿ä¸ªå¾è¡¨æ¾ç¤ºäºä»ä¹è¶å¿ï¼" chinese
Video Analysis
# YouTube video summary
bun skills/media-understand/media-understand.js "https://youtube.com/watch?v=xxx" "æ»ç»è¿ä¸ªè§é¢ç主è¦å
容" chinese
# Local video analysis
bun skills/media-understand/media-understand.js ./video.mp4 "è§é¢ä¸åçäºä»ä¹ï¼" chinese
# Timestamp-based question
bun skills/media-understand/media-understand.js "https://youtu.be/xxx" "è§é¢ 2:30 å¤è®²äºä»ä¹ï¼" chinese
Audio Analysis
# Transcribe audio
bun skills/media-understand/media-understand.js ./recording.mp3 "请转å½è¿æ®µé³é¢" chinese
# Summarize podcast
bun skills/media-understand/media-understand.js ./podcast.m4a "æ»ç»è¿æ®µæå®¢çè¦ç¹" chinese
# Detect speakers
bun skills/media-understand/media-understand.js ./meeting.wav "è¯å«ä¸åç说è¯äººå¹¶æ´çä»ä»¬è¯´çå
容" chinese
Common Prompts
Image:
- æè¿°å¾ç: “è¯·è¯¦ç»æè¿°è¿å¼ å¾ççå 容”
- OCR: “è¯å«å¹¶æåå¾çä¸çæææå”
- ç©ä½è¯å«: “å¾ç䏿åªäºç©ä½ï¼”
Video:
- æ»ç»: “æ»ç»è¿ä¸ªè§é¢ç主è¦å 容”
- æ¶é´æ³: “è§é¢ X:XX å¤åçäºä»ä¹ï¼”
- æåä¿¡æ¯: “è§é¢ä¸æå°äºåªäºå ³é®ä¿¡æ¯ï¼”
Audio:
- 转å½: “请转å½è¿æ®µé³é¢ç宿´å 容”
- æ»ç»: “æ»ç»è¿æ®µé³é¢çè¦ç¹”
- 说è¯äººè¯å«: “è¯å«ä¸åç说è¯äºº”
Notes
- Video via Gemini: Best results with YouTube URLs. Local video files may have limited support.
- Audio tokens: ~32 tokens/second
- Video tokens: ~300 tokens/second at default resolution
- Long media files will consume more tokens
Error Handling
File not found: Check the file path is correct
Unsupported format: Use supported formats listed above
File too large: Compress or trim the media file
API error: è¯·å¨ Max è®¾ç½®ä¸æ£æ¥ Max API Key æ¯å¦æ£ç¡®é ç½®