podcastcut-transcribe
npx skills add https://github.com/luoyuweidu1/podcastcut-skills --skill podcastcut-transcribe
Agent 安装分布
Skill 文档
å£è¯¯è¯å«
å¨ç²åªåºç¡ä¸ï¼è¯å«å£è¯¯/éå¤/å审æ¥ç¨¿æªå¤ççå é¤ â çæç»ä¸å®¡æ¥ç¨¿
å¿«é使ç¨
ç¨æ·: è¯å«å£è¯¯ /path/to/v2.mp3
ç¨æ·: ç²¾åªåå¤
åç½®æ¡ä»¶
éè¦å 宿ç²åªï¼
/podcastcut-contentâ çæpodcast_审æ¥ç¨¿.md/podcastcut-edit-rawâ è¾åº v2 é³é¢
æµç¨
1. å è½½åå
容审æ¥ç¨¿ï¼æåæªå¤ççå 餿 è®°
â
2. FunASR 30s åæ®µè½¬å½ï¼å符级æ¶é´æ³ï¼
â
3. æ£æµéé³ï¼FFmpeg silencedetectï¼â¥2sï¼
â
4. æ£æµè¯æ°è¯ï¼å¯/å/诶ï¼ååæåé¡¿çï¼
â
5. æ£æµå è¯ï¼è¿ç»éå¤åï¼å¦"ä¼ä¼"ï¼
â
6. æ£æµçè¯éå¤ï¼N-gramï¼å¦"å大家é½å¾å
³å¿çå¾å大家é½å¾å
³å¿ç"ï¼
â
7. çæç»ä¸å®¡æ¥ç¨¿ï¼åå¹¶æææ¥æºï¼
â
ãçå¾
ç¨æ·ç¡®è®¤ãâ ç¨æ·ç¡®è®¤åï¼æ§è¡ /podcastcut-edit-fine
ä¸ãæ«æå审æ¥ç¨¿
å ³é®ï¼ç²åªåªè½æ´å¥å ï¼åå¥å 餿 è®°ä¼è¢«è·³è¿ãå¿ é¡»å¨ç²¾åªæ¶å¤çã
def extract_unprocessed_deletions(original_review_path, chars):
"""ä»å审æ¥ç¨¿æåæªå¤ççåå¥å é¤"""
deletions = []
with open(original_review_path, 'r') as f:
content = f.read()
# è§£æ ~~å 餿 è®°~~
pattern = r'~~([^~]+)~~'
for match in re.finditer(pattern, content):
deleted_text = match.group(1)
# æ£æ¥æ¯å¦æ¯åå¥ï¼ä¸æ¯å®æ´å¥åï¼
# ç¨å符级æ¶é´æ³å®ä½
time_range = find_text_in_chars(deleted_text, chars)
if time_range:
deletions.append({
'start': time_range[0],
'end': time_range[1],
'text': deleted_text,
'type': 'original_review'
})
return deletions
äºãæ£æµçè¯éå¤ï¼æ°å¢ï¼
é®é¢ï¼å è¯æ£æµåªè½åç° “ä¼ä¼”ï¼æ æ³åç° “å大家é½å¾å ³å¿çå¾å大家é½å¾å ³å¿ç”ã
è§£å³ï¼N-gram æ»å¨çªå£æ£æµã
def detect_phrase_repetitions(text, chars, min_len=4, max_len=12):
"""æ£æµçè¯çº§éå¤"""
repetitions = []
for phrase_len in range(min_len, max_len + 1):
i = 0
while i < len(text) - phrase_len:
phrase = text[i:i + phrase_len]
# è·³è¿çº¯æ ç¹/ç©ºæ ¼
if not any(c.isalnum() or '\u4e00' <= c <= '\u9fff' for c in phrase):
i += 1
continue
# å¨åç»ææ¬ä¸æ¥æ¾éå¤ï¼å
许ä¸é´æå 个åçé´éï¼
search_start = i + phrase_len
search_end = min(i + phrase_len * 2 + 5, len(text))
rest = text[search_start:search_end]
if phrase in rest:
repeat_pos = rest.find(phrase)
# æ¾å°éå¤ï¼è®¡ç®æ¶é´æ³
first_start_idx = i
first_end_idx = i + phrase_len - 1
repetitions.append({
'phrase': phrase,
'first_start': chars[first_start_idx]['start'],
'first_end': chars[first_end_idx + repeat_pos + phrase_len]['end'],
'type': 'phrase_repeat',
'action': 'delete_first' # ä¿ç第äºä¸ª
})
# è·³è¿å·²å¤ççé¨å
i = search_start + repeat_pos + phrase_len
else:
i += 1
return merge_overlapping(repetitions)
éå¤å¤çè§å
| åºæ¯ | å¤ç | ç¤ºä¾ |
|---|---|---|
| çè¯éå¤ | ä¿ç第äºä¸ª | “å大家é½å¾å ³å¿çå¾å大家é½å¾å ³å¿çç»å¸¸…” |
| å è¯å£è¯¯ | å 第ä¸ä¸ª | “ä¼ |
| åæ³å è¯ | ä¸å | “å¥å¥”ã”æ ¢æ ¢”ã”天天” |
ä¸ãç»ä¸å®¡æ¥ç¨¿æ ¼å¼
è¾åºä¸ä¸ªåå¹¶æææ¥æºç审æ¥ç¨¿ï¼
# ç²¾åªå®¡æ¥ç¨¿
**è¾å
¥**: v2.mp3 (ç²åªå)
**æ¶é¿**: 1:28:00
---
## å 餿¸
å
### å审æ¥ç¨¿æªå¤ç (Nå¤)
ç²åªè·³è¿çåå¥å é¤ï¼
- [ ] `(46.13-49.03)` ~~坿å¯ä»¥è®²ä¸ä¸å¯¹~~ [å审æ¥ç¨¿]
### çè¯éå¤ (Nå¤)
ä¿ç第äºä¸ªï¼å é¤ç¬¬ä¸ä¸ªï¼
- [ ] `(35.31-37.77)` ~~å大家é½å¾å
³å¿çå¾~~ â ä¿ç"å大家é½å¾å
³å¿çç»å¸¸..."
### éé³ (Nå¤)
â¥2s çéé³ç段ï¼
- [ ] `(120.50-123.80)` éé³ 3.3s
### è¯æ°è¯ (Nå¤)
ååæåé¡¿çç¬ç«è¯æ°è¯ï¼
- [ ] `(45.20-45.85)` ~~å¯~~ ä¸ä¸æ: ...说çãå¯ãç¶å...
### å è¯/å£è¯¯ (Nå¤)
- [ ] `(88.30-88.55)` ~~ä¼~~ä¼ (å 第ä¸ä¸ª)
---
## ç»è®¡
| ç±»å | æ°é | æ¶é¿ |
|------|------|------|
| å审æ¥ç¨¿æªå¤ç | 1 | 2.9s |
| çè¯éå¤ | 1 | 2.5s |
| éé³ | 15 | 45.0s |
| è¯æ°è¯ | 42 | 25.0s |
| å è¯/å£è¯¯ | 23 | 8.0s |
| **å计** | 82 | 83.4s |
åãè¾åºæä»¶
<å·¥ä½ç®å½>/
âââ transcript_chars.json # å符级æ¶é´æ³
âââ silences.json # é鳿£æµç»æ
âââ fillers.json # è¯æ°è¯æ£æµç»æ
âââ repetitions.json # å è¯æ£æµç»æ
âââ phrase_repeats.json # çè¯é夿£æµç»æï¼æ°å¢ï¼
âââ 审æ¥ç¨¿.md # ç»ä¸å®¡æ¥ç¨¿
âââ deletions.json # å 餿¸
åï¼ä¾ edit-fine 使ç¨ï¼
äºãåæ³å è¯ï¼ä¸å ï¼
VALID_REDUPLICATIONS = [
# 亲å±ç§°å¼
'å¥å¥', 'å§å§', '妹妹', 'å¼å¼', 'ç¸ç¸', 'å¦å¦', 'ç·ç·', '奶奶',
# 人å
'å®å®', '麦é
',
# è¯æ°è¯å ç¨
'å¯å¯', 'å¦å¦', 'åå',
# å¯è¯å ç¨
'常常', '满满', 'æ´æ´', 'åå', 'æ
¢æ
¢', 'æ¸æ¸', 'ç¨ç¨', 'å·å·',
'é»é»', 'ææ', 'éé', '轻轻', 'éé', '深深', 'æµ
æµ
',
'å¤å¤', 'å°å°', '大大', 'å°å°', 'é«é«', 'ä½ä½',
'好好', 'åå', 'å¿«å¿«', 'ä¹ä¹', 'å¼å¼', 'å¿å¿',
'天天', 'å¹´å¹´', 'ææ', 'æ¥æ¥', 'å¤å¤',
'ç¹ç¹', 'æ»´æ»´', 'çç', 'å±å±', 'æ¥æ¥', 'èè',
]
å ãå¯å¤ç¨èæ¬
èæ¬ä½ç½®ï¼scripts/ ç®å½
6.1 transcribe_chars.py
FunASR 30s åæ®µè½¬å½ï¼çæå符级æ¶é´æ³ã
python scripts/transcribe_chars.py <é³é¢æä»¶> <è¾åºç®å½>
è¾åº: transcript_chars.json
6.2 detect_phrase_repeats.py â æ°å¢
æ£æµçè¯çº§éå¤ï¼N-gram æ»å¨çªå£ï¼ã
python scripts/detect_phrase_repeats.py <å·¥ä½ç®å½>
è¾å
¥: transcript_chars.json
è¾åº: phrase_repeats.json
ä½¿ç¨æ¶æº: 转å½å®æåï¼æ£æµå¦ “å大家é½å¾å ³å¿çå¾å大家é½å¾å ³å¿ç” è¿ç±»çè¯éå¤ã
6.3 extract_original_deletions.py â æ°å¢
ä»å审æ¥ç¨¿æåæªå¤ççå 餿 è®°ã
python scripts/extract_original_deletions.py <å·¥ä½ç®å½> <å审æ¥ç¨¿è·¯å¾>
è¾å
¥: transcript_chars.json + å审æ¥ç¨¿ï¼å« ~~å é¤çº¿~~ï¼
è¾åº: original_deletions.json
ä½¿ç¨æ¶æº: ç²åªåï¼å审æ¥ç¨¿ä¸æåå¥å 餿 è®°æªè¢«å¤çæ¶ä½¿ç¨ã
宿´æµç¨ç¤ºä¾
WORK_DIR="/path/to/project"
AUDIO="/path/to/v2.mp3"
ORIGINAL_REVIEW="/path/to/podcast_审æ¥ç¨¿.md"
# 1. 转å½ï¼å¦æè¿æ²¡æ transcript_chars.jsonï¼
python scripts/transcribe_chars.py "$AUDIO" "$WORK_DIR"
# 2. æ£æµçè¯éå¤
python scripts/detect_phrase_repeats.py "$WORK_DIR"
# 3. æåå审æ¥ç¨¿å 餿 è®°
python scripts/extract_original_deletions.py "$WORK_DIR" "$ORIGINAL_REVIEW"
# 4. çæç»ä¸å®¡æ¥ç¨¿ï¼ç± AI å并忣æµç»æï¼
ä¸ãæ¹æ³è®º
è¯¦è§ tips/å£è¯¯è¯å«æ¹æ³è®º.mdï¼
- FunASR 30s åæ®µé¿å æ¶é´æ³æ¼ç§»
- é token åæå£è¯¯è¾¹ç
- “å åé¢ä¿å颔 ç精确å¤ç
åé¦è®°å½
2026-02-01
-
æ°å¢çè¯çº§é夿£æµ
- é®é¢ï¼å è¯æ£æµåªè½åç° “ä¼ä¼”ï¼æ æ³åç°çè¯éå¤
- æ¡ä¾ï¼
å大家é½å¾å ³å¿çå¾å大家é½å¾å ³å¿çç»å¸¸åçç burn out - è§£å³ï¼N-gram æ»å¨çªå£æ£æµï¼è¯å« 4-12 åççè¯éå¤
-
æ°å¢æ«æå审æ¥ç¨¿
- é®é¢ï¼ç²åªåªè½æ´å¥å ï¼åå¥å 餿 记被跳è¿
- æ¡ä¾ï¼å审æ¥ç¨¿æ è®°
~~å¯ï¼æå¯ä»¥è®²ä¸ä¸å¯¹~~ - è§£å³ï¼transcribe æ¶æ«æå审æ¥ç¨¿ï¼æåæªå¤ççå 餿 è®°
-
ç»ä¸å®¡æ¥ç¨¿è¾åº
- åå¹¶æææ¥æºï¼å审æ¥ç¨¿æªå¤ç + éé³ + è¯æ°è¯ + å è¯ + çè¯éå¤
- edit-fine åªé读åè¿ä¸ä¸ªæä»¶
-
éå¤å¤çè§åï¼ä¿ç第äºä¸ª
- ç¨æ·åé¦ï¼é夿¶ç¬¬äºä¸ªé常æ¯è¯´è¯äººæ³è¦çæ£ç¡®çæ¬
- è§åï¼çè¯é夿¶å é¤ç¬¬ä¸ä¸ªï¼ä¿ç第äºä¸ª