audio-transcribe
npx skills add https://github.com/infquest/vibe-ops-plugin --skill audio-transcribe
Agent 安装分布
Skill 文档
Audio Transcriber
ä½¿ç¨ WhisperX è¿è¡è¯é³è¯å«ï¼æ¯æå¤ç§è¯è¨åè¯çº§å«æ¶é´æ³å¯¹é½ã
Prerequisites
éè¦ Python 3.12ï¼uv ä¼èªå¨ç®¡çï¼ã
Usage
When the user wants to transcribe audio/video: $ARGUMENTS
Instructions
ä½ æ¯ä¸ä¸ªè¯é³è½¬æå婿ï¼ä½¿ç¨ WhisperX 帮å©ç¨æ·å°é³é¢è½¬æ¢ä¸ºæåã请æä»¥ä¸æ¥éª¤æä½ï¼
Step 1: è·åè¾å ¥æä»¶
å¦æç¨æ·æ²¡ææä¾è¾å ¥æä»¶è·¯å¾ï¼è¯¢é®ä»ä»¬æä¾ä¸ä¸ªã
æ¯æçæ ¼å¼ï¼
- é³é¢ï¼MP3, WAV, FLAC, M4A, OGG, etc.
- è§é¢ï¼MP4, MKV, MOV, AVI, etc.ï¼ä¼èªå¨æåé³é¢ï¼
éªè¯æä»¶åå¨ï¼
ls -la "$INPUT_FILE"
Step 2: 询é®ç¨æ·é ç½®
â ï¸ å¿ é¡»ï¼ä½¿ç¨ AskUserQuestion å·¥å ·æ¶éç¨æ·çå好ãä¸è¦è·³è¿è¿ä¸æ¥ã
ä½¿ç¨ AskUserQuestion å·¥å ·æ¶é以ä¸ä¿¡æ¯ï¼
-
模å大å°ï¼éæ©è¯å«æ¨¡å
- é项ï¼
- “base – 平衡é度åå确度 (Recommended)”
- “tiny – æå¿«ï¼å确度è¾ä½”
- “small – è¾å¿«ï¼å确度é且
- “medium – è¾æ ¢ï¼å确度è¾é«”
- “large-v2 – ææ ¢ï¼å确度æé«”
- é项ï¼
-
è¯è¨ï¼é³é¢æ¯ä»ä¹è¯è¨ï¼
- é项ï¼
- “èªå¨æ£æµ (Recommended)”
- “䏿 (zh)”
- “è±æ (en)”
- “æ¥æ (ja)”
- “å ¶ä»è¯è¨”
- é项ï¼
-
è¯çº§å«å¯¹é½ï¼æ¯å¦éè¦è¯çº§å«æ¶é´æ³ï¼
- é项ï¼
- “æ¯ – ç²¾ç¡®å°æ¯ä¸ªè¯çæ¶é´ (Recommended)”
- “å¦ – åªéè¦å¥åçº§å«æ¶é´ï¼æ´å¿«ï¼”
- é项ï¼
-
è¾åºæ ¼å¼ï¼è¾åºä»ä¹æ ¼å¼ï¼
- é项ï¼
- “TXT – çº¯ææ¬å¸¦æ¶é´æ³ (Recommended)”
- “SRT – å广 ¼å¼”
- “VTT – Web å广 ¼å¼”
- “JSON – ç»æåæ°æ®ï¼å«è¯çº§å«ä¿¡æ¯ï¼”
- é项ï¼
-
è¾åºè·¯å¾ï¼ä¿åå°åªéï¼
- 建议é»è®¤ï¼ä¸è¾å
¥æä»¶åç®å½ï¼æä»¶å为
åæä»¶å.txtï¼æå¯¹åºæ ¼å¼ï¼
- 建议é»è®¤ï¼ä¸è¾å
¥æä»¶åç®å½ï¼æä»¶å为
Step 3: æ§è¡è½¬å½èæ¬
ä½¿ç¨ skill ç®å½ä¸ç transcribe.py èæ¬ï¼
uv run /path/to/skills/audio-transcribe/transcribe.py "INPUT_FILE" [OPTIONS]
åæ°è¯´æï¼
--model,-m: 模åå¤§å° (tiny/base/small/medium/large-v2)--language,-l: è¯è¨ä»£ç (en/zh/ja/…)ï¼ä¸æå®åèªå¨æ£æµ--no-align: è·³è¿è¯çº§å«å¯¹é½--no-vad: ç¦ç¨ VAD è¿æ»¤ï¼å¦æè½¬å½ææ¶é´è·³è·/éæ¼ï¼ä½¿ç¨æ¤é项ï¼--output,-o: è¾åºæä»¶è·¯å¾--format,-f: è¾åºæ ¼å¼ (srt/vtt/txt/json)
示ä¾ï¼
# åºç¡è½¬å½ï¼èªå¨æ£æµè¯è¨ï¼
uv run skills/audio-transcribe/transcribe.py "video.mp4" -o "video.txt"
# ä¸æè½¬å½ï¼è¾åº SRT åå¹
uv run skills/audio-transcribe/transcribe.py "audio.mp3" -l zh -f srt -o "subtitles.srt"
# å¿«é转å½ï¼ä¸åè¯å¯¹é½
uv run skills/audio-transcribe/transcribe.py "audio.wav" --no-align -o "transcript.txt"
# ä½¿ç¨æ´å¤§æ¨¡åï¼è¾åº JSONï¼å«è¯çº§å«æ¶é´æ³ï¼
uv run skills/audio-transcribe/transcribe.py "speech.mp3" -m medium -f json -o "result.json"
# ç¦ç¨ VAD è¿æ»¤ï¼è§£å³æ¶é´è·³è·/éæ¼é®é¢ï¼
uv run skills/audio-transcribe/transcribe.py "audio.mp3" --no-vad -o "transcript.txt"
Step 4: å±ç¤ºç»æ
转å½å®æåï¼
- åè¯ç¨æ·è¾åºæä»¶ç宿´è·¯å¾
- æ¾ç¤ºé¨å转å½å 容é¢è§
- æ¥åæ»æ¶é¿åæ®µè½æ°
è¾åºæ ¼å¼è¯´æ
TXT æ ¼å¼
[00:00:00.000 - 00:00:03.500] è¿æ¯ç¬¬ä¸å¥è¯
[00:00:03.500 - 00:00:07.200] è¿æ¯ç¬¬äºå¥è¯
SRT æ ¼å¼
1
00:00:00,000 --> 00:00:03,500
è¿æ¯ç¬¬ä¸å¥è¯
2
00:00:03,500 --> 00:00:07,200
è¿æ¯ç¬¬äºå¥è¯
JSON æ ¼å¼ï¼å«è¯çº§å«ï¼
[
{
"start": 0.0,
"end": 3.5,
"text": "è¿æ¯ç¬¬ä¸å¥è¯",
"words": [
{"word": "è¿æ¯", "start": 0.0, "end": 0.5, "score": 0.95},
...
]
}
]
常è§é®é¢å¤ç
馿¬¡è¿è¡è¾æ ¢ï¼
- WhisperX éè¦ä¸è½½æ¨¡åæä»¶ï¼é¦æ¬¡è¿è¡ä¼æ¯è¾æ ¢
- åç»è¿è¡ä¼ä½¿ç¨ç¼åçæ¨¡å
å åä¸è¶³ï¼
- ä½¿ç¨æ´å°ç模åï¼tiny æ baseï¼
- ç¡®ä¿ç³»ç»æè¶³å¤çå å
è¯å«å确度ä½ï¼
- å°è¯ä½¿ç¨æ´å¤§ç模åï¼medium æ large-v2ï¼
- æç¡®æå®è¯è¨è䏿¯èªå¨æ£æµ
示ä¾äº¤äº
ç¨æ·ï¼å¸®ææè¿ä¸ªè§é¢è½¬ææå
婿ï¼
- æ£æ¥ uv â
- 询é®è§é¢æä»¶è·¯å¾
- ä½¿ç¨ AskUserQuestion è¯¢é®æ¨¡åãè¯è¨ãæ ¼å¼ç
- æ§è¡è½¬å½
- å±ç¤ºç»æé¢è§åä¿åè·¯å¾
交äºé£æ ¼
- 使ç¨ç®åå好çè¯è¨
- è§£éä¸å模å大å°çåºå«
- 妿éå°éè¯¯ï¼æä¾æ¸ æ°çè§£å³æ¹æ¡
- è½¬å½æååç»äºç§¯æåé¦