pull-film
npx skills add https://github.com/benzema216/dreamina-claude-skills --skill pull-film
Agent 安装分布
Skill 文档
ä¸é®è§é¢æçåæ
èªå¨åæè§é¢çé头è¯è¨ãæå¾ãè²å½©åé³é¢ï¼çæä¸ä¸ç HTML å¯è§åæ¥åã
è§¦åæ¡ä»¶
å½ç¨æ·è¯·æ±ä»¥ä¸å 容æ¶è§¦åï¼
- “帮æåæ/æçè¿ä¸ªè§é¢”
- “åæè§é¢çé头è¯è¨”
- “/pull-film <è§é¢è·¯å¾æURL>”
è¾å ¥è¦æ±
- è§é¢æ¥æºï¼æ¬å°æä»¶è·¯å¾ï¼.mp4/.mkv/.avi/.movï¼æå¨çº¿ URLï¼YouTube/Bilibili çï¼
- å¯éåæ°ï¼
--language <zh/en/ja>é³é¢è¯è¨ã--output <ç®å½>è¾åºç®å½ã--no-audioè·³è¿é³é¢ã--max-scenes <æ°é>éå¶é头æ°
ä¾èµ
| å·¥å · | ç¨é | å®è£ |
|---|---|---|
| ffmpeg/ffprobe | è§é¢å¤çãæ½å¸§ | brew install ffmpeg |
| Python3 + PIL | è²å½©åæ | pip3 install Pillow |
| scenedetect | é头åå | pip3 install "scenedetect[opencv]" |
| yt-dlp | å¨çº¿è§é¢ä¸è½½ï¼å¯éï¼ | pip3 install yt-dlp |
| whisper | é³é¢è½¬å½ï¼å¯éï¼ | pip3 install openai-whisper |
æ§è¡æµç¨
第 1 æ¥ï¼ç¯å¢æ£æ¥
æ£æ¥ ffmpegãpython3ãscenedetect æ¯å¦å·²å®è£ ï¼ç¼ºå°åæç¤ºç¨æ·å®è£ ã
第 2 æ¥ï¼å¤çè§é¢è¾å ¥
æ¬å°è§é¢ â ç¨ ffprobe è·åå
ä¿¡æ¯ï¼æ¶é¿ãå辨çã帧çãç¼ç ï¼
å¨çº¿è§é¢ â ç¨ yt-dlp ä¸è½½ååå¤çï¼
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \
--merge-output-format mp4 -o "<è¾åºç®å½>/source_video.mp4" "<URL>"
第 3 æ¥ï¼å建è¾åºç®å½ç»æ
mkdir -p "<è¾åºç®å½>"/{frames,data,audio}
第 4 æ¥ï¼é头åå
ä½¿ç¨ ffmpeg çåºæ¯æ£æµæ PySceneDetectï¼
æ¹æ³ A – ä½¿ç¨ ffmpeg åºæ¯æ£æµï¼
ffmpeg -i "<è§é¢>" -filter:v "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | grep showinfo
æ¹æ³ B – ä½¿ç¨ PySceneDetectï¼æ´åç¡®ï¼ï¼
python3 << 'EOF'
from scenedetect import detect, AdaptiveDetector
import json
scenes = detect("<è§é¢è·¯å¾>", AdaptiveDetector())
result = []
for i, (start, end) in enumerate(scenes, 1):
result.append({
"id": i,
"start_time": start.get_seconds(),
"end_time": end.get_seconds(),
"start_frame": start.get_frames(),
"end_frame": end.get_frames()
})
print(json.dumps(result, indent=2))
EOF
å°ç»æä¿åå° <è¾åºç®å½>/data/scenes.json
第 5 æ¥ï¼æåå ³é®å¸§
对æ¯ä¸ªé头æå 3 帧ï¼å¼å¤´ãä¸é´ãç»å°¾ï¼ï¼
# å¯¹äºæ¯ä¸ªé头
ffmpeg -ss <start_time> -i "<è§é¢>" -vframes 1 -q:v 2 "<è¾åºç®å½>/frames/scene_<ID>_start.jpg"
ffmpeg -ss <mid_time> -i "<è§é¢>" -vframes 1 -q:v 2 "<è¾åºç®å½>/frames/scene_<ID>_mid.jpg"
ffmpeg -ss <end_time-0.1> -i "<è§é¢>" -vframes 1 -q:v 2 "<è¾åºç®å½>/frames/scene_<ID>_end.jpg"
第 6 æ¥ï¼é头åæï¼ä½¿ç¨ Claude Visionï¼
对æ¯ä¸ªé头çå ³é®å¸§ï¼ä½¿ç¨ Read å·¥å ·è¯»åå¾çï¼ç¶ååæï¼
åæå 容ï¼
-
æ¯å« (Shot Scale)
- ç¹åï¼é¢é¨æç©ä½ç»è填满ç»é¢
- è¿æ¯ï¼äººç©è¸é¨ä»¥ä¸
- 䏿¯ï¼äººç©èçæè °é¨ä»¥ä¸
- å ¨æ¯ï¼å®æ´äººç©æåºæ¯ä¸»ä½
- è¿æ¯ï¼å¹¿éç¯å¢ï¼äººç©å æ¯å°
-
è¿å¨ (Camera Movement)
- åºå®ï¼æºä½ä¸å¨
- æ¨ï¼å主ä½é è¿
- æï¼è¿ç¦»ä¸»ä½
- æï¼æ°´å¹³è½¬å¨
- ç§»ï¼æ¨ªåæçºµåç§»å¨
- è·ï¼è·éè¿å¨ç©ä½
- åéï¼åç´åé
- ææï¼æææ¾æå¨
-
è§åº¦ (Camera Angle)
- å¹³è§ï¼ä¸è¢«æå¯¹è±¡å¹³è¡
- ä»°è§ï¼ä»ä¸å¾ä¸æ
- 俯è§ï¼ä»ä¸å¾ä¸æ
- è·å °è§ï¼ç»é¢å¾æ
-
æå¾ (Composition)
- ä¸åæ³ã对称ãå¼å¯¼çº¿ãæ¡ä¸æ¡ã对è§çº¿ãä¸å¿æå¾
-
è²å½©æ 绪
- è²æ¸©ï¼å·è²è°/æè²è°/䏿§
- æ´ä½æ°å´
第 7 æ¥ï¼è²å½©åæ
ä½¿ç¨ Python æå主è²è°ï¼
python3 << 'EOF'
import json
from PIL import Image
from collections import Counter
import colorsys
def analyze_colors(image_path, n_colors=5):
img = Image.open(image_path).convert('RGB')
img = img.resize((100, 100)) # 缩å°å é
pixels = list(img.getdata())
# ç®åé¢è²ï¼éåï¼
def quantize(color):
return tuple(c // 32 * 32 for c in color)
quantized = [quantize(p) for p in pixels]
counter = Counter(quantized)
top_colors = counter.most_common(n_colors)
# 转为 HEX
hex_colors = ['#{:02x}{:02x}{:02x}'.format(*c[0]) for c in top_colors]
# åæè²æ¸©
avg_r = sum(p[0] for p in pixels) / len(pixels)
avg_b = sum(p[2] for p in pixels) / len(pixels)
temperature = "æè²è°" if avg_r > avg_b * 1.1 else ("å·è²è°" if avg_b > avg_r * 1.1 else "䏿§")
return {"dominant": hex_colors, "temperature": temperature}
result = analyze_colors("<å¾çè·¯å¾>")
print(json.dumps(result))
EOF
第 8 æ¥ï¼é³é¢åæï¼å¯éï¼
å¦ææ²¡æ --no-audioï¼
# æåé³é¢
ffmpeg -i "<è§é¢>" -vn -acodec pcm_s16le -ar 16000 -ac 1 "<è¾åºç®å½>/audio/audio.wav"
# ä½¿ç¨ Whisper 转å½ï¼å¦æå·²å®è£
ï¼
python3 << 'EOF'
import whisper
import json
model = whisper.load_model("base")
result = model.transcribe("<è¾åºç®å½>/audio/audio.wav", language="<è¯è¨>")
output = {
"language": result.get("language", "unknown"),
"segments": [{"start": s["start"], "end": s["end"], "text": s["text"]} for s in result["segments"]],
"full_text": result["text"]
}
print(json.dumps(output, ensure_ascii=False, indent=2))
EOF
å°è½¬å½ç»æä¿åå° <è¾åºç®å½>/data/transcript.json
第 9 æ¥ï¼çæ HTML æ¥å
å建ä¸ä¸ªå å«ä»¥ä¸å 容ç HTML æ¥åï¼
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>è§é¢æçæ¥å - [æ é¢]</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
/* ç°ä»£åæ ·å¼ */
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: system-ui, sans-serif; background: #f5f5f5; }
.container { max-width: 1400px; margin: 0 auto; padding: 20px; }
header { background: linear-gradient(135deg, #667eea, #764ba2); color: white; padding: 40px; border-radius: 12px; text-align: center; }
.section { background: white; border-radius: 12px; padding: 25px; margin: 20px 0; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
.scene-card { border: 1px solid #eee; border-radius: 10px; margin: 15px 0; overflow: hidden; }
.scene-header { background: #f8f9fa; padding: 15px; display: flex; justify-content: space-between; }
.keyframes { display: flex; gap: 10px; }
.keyframes img { width: 200px; height: 112px; object-fit: cover; border-radius: 6px; }
.color-swatch { width: 30px; height: 30px; border-radius: 6px; display: inline-block; }
</style>
</head>
<body>
<div class="container">
<header>
<h1>ð¬ [è§é¢æ é¢]</h1>
<p>æ¶é¿: [æ¶é¿] | å辨ç: [宽xé«] | é头æ°: [æ°é]</p>
</header>
<div class="section">
<h2>ð ç»è®¡æ¦è§</h2>
<canvas id="scaleChart"></canvas>
<canvas id="durationChart"></canvas>
</div>
<div class="section">
<h2>ðï¸ æç表</h2>
<!-- æ¯ä¸ªé头çå¡ç -->
<div class="scene-card">
<div class="scene-header">
<span>é头 #1</span>
<span>0.00s - 3.50s</span>
</div>
<div class="scene-content">
<div class="keyframes">
<img src="frames/scene_001_start.jpg">
<img src="frames/scene_001_mid.jpg">
<img src="frames/scene_001_end.jpg">
</div>
<div class="analysis">
<p><strong>æ¯å«:</strong> 䏿¯</p>
<p><strong>è¿å¨:</strong> åºå®</p>
<p><strong>è§åº¦:</strong> å¹³è§</p>
<p><strong>æå¾:</strong> ä¸åæ³</p>
<p><strong>è²æ¸©:</strong> æè²è°</p>
<p><strong>主è²è°:</strong> <span class="color-swatch" style="background:#xxx"></span></p>
<p><strong>对ç½:</strong> "..."</p>
</div>
</div>
</div>
</div>
<div class="section">
<h2>ð 宿´å¯¹ç½</h2>
<pre>[è½¬å½ææ¬]</pre>
</div>
</div>
<script>
// Chart.js å¾è¡¨ä»£ç
</script>
</body>
</html>
å°æ¥åä¿åå° <è¾åºç®å½>/report.html
第 10 æ¥ï¼è¾åºç»æ
宿ååç¥ç¨æ·ï¼
- æ¥åè·¯å¾ï¼
<è¾åºç®å½>/report.html - é头æ°é
- æ»æ¶é¿
- æ¯å¦å å«é³é¢è½¬å½
è¾åºæ ¼å¼
<è¾åºç®å½>/
âââ report.html # 主æ¥åï¼æµè§å¨æå¼ï¼
âââ frames/ # å
³é®å¸§æªå¾
â âââ scene_001_start.jpg
â âââ scene_001_mid.jpg
â âââ ...
âââ data/
â âââ scenes.json # éå¤´æ°æ®
â âââ analysis.json # åæç»æ
â âââ transcript.json # 对ç½è½¬å½
âââ audio/
âââ audio.wav # æåçé³è½¨
示ä¾
ç¨æ·è¾å ¥ï¼
/pull-film ./movie.mp4
ç¨æ·è¾å ¥ï¼
/pull-film https://www.youtube.com/watch?v=xxxxx --language zh
ç¨æ·è¾å ¥ï¼
帮æåæä¸ä¸è¿ä¸ªè§é¢çé头è¯è¨ ./trailer.mp4
注æäºé¡¹
- 对äºé¿è§é¢ï¼>10åéï¼ï¼å»ºè®®ä½¿ç¨
--max-sceneséå¶åææ°é - Claude Vision åæéè¦é个读åå ³é®å¸§å¾ç
- Whisper 转å½å¨é¦æ¬¡ä½¿ç¨æ¶ä¼ä¸è½½æ¨¡åï¼çº¦ 150MBï¼
- å¨çº¿è§é¢ä¸è½½ä¾èµ yt-dlpï¼æäºå¹³å°å¯è½æéå¶