ffmpeg-analyse-video

📁 fabriqaai/ffmpeg-analyse-video-skill 📅 13 days ago

总安装量

周安装量

#32480

全站排名

安装命令

npx skills add https://github.com/fabriqaai/ffmpeg-analyse-video-skill --skill ffmpeg-analyse-video

Agent 安装分布

github-copilot 9

kimi-cli 8

gemini-cli 8

amp 8

codex 8

opencode 8

Skill 文档

FFmpeg Video Analysis

Extract frames from video files with ffmpeg. Delegate frame reading to sub-agents to preserve the main context window. Synthesise a structured timestamped summary from text-only sub-agent reports.

Architecture: Context-Efficient Sub-Agent Pipeline

Problem: Reading dozens of images into the main conversation context consumes most of the context window, leaving little room for synthesis and follow-up.

Solution: A 3-phase pipeline:

Main Agent                          Sub-Agents (disposable context)
ââââââââââ                          ââââââââââââââââââââââââââââââ
1. ffprobe metadata        ââââº
2. ffmpeg frame extraction ââââº
3. Split frames into batches âââº   4. Read images (vision)
                                      Write text descriptions
                                      to batch_N_analysis.md
5. Read text files only    ââââ    (context discarded)
6. Synthesise final output

Images only ever exist inside sub-agent contexts. The main agent only reads lightweight text files. This cuts context usage by ~90%.

1. Prerequisites

which ffmpeg && which ffprobe

If either is missing, show platform-specific install instructions and STOP:

macOS: brew install ffmpeg
Ubuntu/Debian: sudo apt install ffmpeg
Windows: choco install ffmpeg or winget install ffmpeg

2. Setup Temp Directory

# macOS/Linux
TMPDIR="/tmp/video-analysis-$(date +%s)"
mkdir -p "$TMPDIR"

# Windows (PowerShell)
# $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)"
# New-Item -ItemType Directory -Path $TMPDIR

3. Extract Video Metadata

ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH"

Extract and report: duration, resolution (width x height), fps, codec, file size, whether audio is present.

If no video stream is found, report “audio-only file” and STOP. If file size > 2GB, warn the user and suggest analysing a time range with -ss START -to END.

4. Extract Frames

Choose strategy based on duration:

Duration	Strategy	Command
0-60s	1 frame every 2s	`ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg`
1-10min	Scene detection (threshold 0.3)	`ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg`
10-30min	Keyframe extraction	`ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg`
30min+	Thumbnail filter	`ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg`

For thumbnail filter, calculate SEGMENT_FRAMES = total_frames / 60 to cap output at ~60 frames.

Fallbacks:

Scene detection yields 0 frames â retry with interval at 1 frame/5s
More than 100 frames extracted â subsample evenly to 80
Frame extraction fails â try the next simpler strategy (scene â interval, keyframe â interval)

Time range analysis: When user specifies a range, prepend -ss START -to END before -i. Higher detail mode: If requested, double the fps rate and lower scene threshold to 0.2.

After extraction, list all frame files and calculate each frame’s timestamp from its sequence number and the extraction rate.

5. Delegate Frame Analysis to Sub-Agents

This is the critical context-saving step. Do NOT read frame images in the main conversation. Instead, split frames into batches and delegate each batch to a sub-agent.

5a. Prepare Batch Manifest

Split the extracted frame file list into batches of 8-10 frames each. For each batch, record:

Batch number (1, 2, 3, …)
Frame file paths (absolute)
Frame timestamps (calculated from sequence number)
Output file path: TMPDIR/batch_N_analysis.md

5b. Spawn Sub-Agents

For each batch, spawn a sub-agent with the prompt below. Launch all batches in parallel where the tool supports it â they are fully independent.

Sub-Agent Prompt Template

Use this prompt verbatim, substituting the placeholders:

You are analysing frames extracted from a video file.

VIDEO: {filename}
DURATION: {duration}
BATCH: {batch_number} of {total_batches}

Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description.

FRAMES:
{for each frame in batch}
- {absolute_path_to_frame} (timestamp: {MM:SS})
{end for}

For each frame, describe:
1. SCENE: What is visible (layout, UI elements, environment)
2. CONTENT: Text, code, labels, menus, or dialogue visible on screen
3. ACTION: What is happening or has changed since the likely previous frame
4. DETAILS: Any notable specifics (error messages, URLs, file names, button states)

After describing all frames, add a BATCH SUMMARY section with:
- Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation)
- Key events in this batch's time range
- Any text/prompts/commands the user typed (quote exactly)

Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md

Format the output file as:

# Batch {N} Analysis ({start_timestamp} - {end_timestamp})

## Frame-by-Frame

### Frame {sequence} ({timestamp})
- **Scene**: ...
- **Content**: ...
- **Action**: ...
- **Details**: ...

(repeat for each frame)

## Batch Summary
- **Content Type**: ...
- **Key Events**: ...
- **Quoted Text/Prompts**: ...

How to Spawn

Use whatever sub-agent, background task, or independent agent mechanism your tool provides. The requirements are simple â each sub-agent needs to:

Read image files (the frame JPEGs)
Write a text file (the batch analysis markdown)

Launch all batches in parallel if your tool supports it â they are fully independent with no shared state.

If your tool has no sub-agent mechanism, fall back to reading frames directly in the main context but limit to 20 frames maximum and warn the user about context usage.

5c. Collect Results

After all sub-agents complete, read the text analysis files. These are lightweight markdown â no images enter the main context.

ls TMPDIR/batch_*_analysis.md

Read each batch_N_analysis.md file in order. These contain only text descriptions â the context cost is minimal compared to reading the original images.

6. Synthesise Output

Using only the text from the batch analysis files, perform synthesis in the main context:

Merge all frame descriptions into a single chronological timeline
Group frames into natural segments (same scene, slide, or screen)
Detect the dominant content type across all batches
Identify 3-7 key moments
Extract all quoted text, prompts, or commands the user typed
Write a 2-5 sentence narrative summary

Format the output as:

# Video Analysis: [filename]

## Metadata
| Property | Value |
|----------|-------|
| Duration | M:SS |
| Resolution | WxH |
| FPS | N |
| Content Type | [detected] |
| Frames Analysed | N |

## Timeline
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

## Key Moments
1. **[M:SS] Title**: Description
2. **[M:SS] Title**: Description
3. **[M:SS] Title**: Description

## Summary
[2-5 sentence narrative paragraph summarising the entire video]

7. Cleanup

Remove the temp directory after output is complete:

# macOS/Linux
rm -rf "$TMPDIR"

# Windows (PowerShell)
# Remove-Item -Recurse -Force $TMPDIR

Skip cleanup if the user asks to keep frames.

Advanced Options

Time range: “Analyse 2:00 to 5:00 of video.mp4” â use -ss 120 -to 300
Higher detail: “Analyse in high detail” â double frame rate, lower scene threshold to 0.2
Focus area: “Focus on the code shown” â prioritise text/code extraction in sub-agent prompts

Sprite sheet: For a visual overview, generate a contact sheet:

ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg

Error Handling

ffmpeg not found â install instructions per platform, STOP
No video stream â report audio-only, STOP
Scene detection yields 0 frames â fallback to interval
Too many frames (>100) â subsample to 80
Large files (>2GB) â warn, suggest time range
Sub-agent fails or times out â read that batch’s frames directly as fallback, warn about context usage
Frame read failure in sub-agent â skip frame, note gap in batch analysis file

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台