scikit-video
npx skills add https://github.com/tondevrel/scientific-agent-skills --skill scikit-video
Agent 安装分布
Skill 文档
scikit-video – Scientific Video Processing
scikit-video simplifies the complex world of video codecs and containers by providing a consistent NumPy-based interface. It allows for the calculation of motion vectors, video quality assessment (VQA), and seamless integration with the rest of the scientific Python stack.
When to Use
- Reading and writing video files in various formats (MP4, AVI, MKV) via FFmpeg.
- Extracting specific frames or segments from long videos without loading them entirely into memory.
- Calculating motion estimation (Block Matching, Optical Flow).
- Measuring video quality (PSNR, SSIM, VIF, NIQE).
- Generating video datasets for machine learning.
- Visualizing temporal changes in pixel data (e.g., scientific recordings).
- Handling raw YUV data streams.
Reference Documentation
Official docs: http://www.scikit-video.org/
GitHub: https://github.com/scikit-video/scikit-video
Search patterns: skvideo.io.vread, skvideo.io.FFmpegReader, skvideo.motion, skvideo.measure
Core Principles
Video as 4D Arrays
A video is represented as a NumPy array with shape (T, H, W, C):
- T: Time (number of frames)
- H: Height
- W: Width
- C: Channels (usually 3 for RGB)
FFmpeg Backend
Scikit-video does not contain its own codecs; it is a bridge to FFmpeg. You must have FFmpeg installed on your system for skvideo.io to function.
Generators for Large Data
For long videos, scikit-video provides generator-based readers (vreader) to process frames one by one, preventing RAM exhaustion.
Quick Reference
Installation
pip install scikit-video
# Note: Ensure ffmpeg is in your system PATH
Standard Imports
import skvideo.io
import skvideo.motion
import skvideo.measure
import numpy as np
Basic Pattern – Read and Inspect
import skvideo.io
# 1. Read the whole video into a NumPy array
# Shape: (frames, height, width, 3)
video_data = skvideo.io.vread("experiment.mp4")
# 2. Get basic info
n_frames, height, width, channels = video_data.shape
print(f"FPS: {n_frames / 10}, Resolution: {width}x{height}")
# 3. Access a specific frame
frame_10 = video_data[10]
Critical Rules
â DO
- Use vreader for large files – Always use generator-based reading for high-resolution or long videos.
- Set num_frames – If you know the number of frames you need, specify it to avoid unnecessary scanning.
- Check FFmpeg path – Use
skvideo.setFFmpegPath()if FFmpeg is installed in a non-standard location. - Normalize for Metrics – Ensure pixel values are in the range expected by
skvideo.measure(usually [0, 255] for uint8). - Use vwrite for simple output – It handles the complex FFmpeg command-line arguments for you.
- Consider YUV – When working with raw transmission data, use the specific YUV reading capabilities.
â DON’T
- Load 4K video with vread – A 1-minute 4K video will exceed most RAM capacities.
- Ignore the inputdict and outputdict – These allow you to pass specific flags to FFmpeg (like bitrate, pixel format, or codec).
- Assume RGB order – Always verify the channel order after reading, especially if using external codecs.
- Process video without Denoising – Video noise can ruin motion estimation; apply spatial or temporal filters first.
Anti-Patterns (NEVER)
import skvideo.io
# â BAD: Loading a massive file at once
# video = skvideo.io.vread("huge_4k_recording.mp4") # CRASH!
# â
GOOD: Processing frame by frame
reader = skvideo.io.vreader("huge_4k_recording.mp4")
for frame in reader:
# Process frame
pass
# â BAD: Manual frame writing in a loop with manual codec setup
# (Fragile and complex)
# â
GOOD: Use FFmpegWriter
writer = skvideo.io.FFmpegWriter("output.mp4")
for frame in processed_frames:
writer.writeFrame(frame)
writer.close()
# â BAD: Relying on system default FFmpeg without checking
# â
GOOD: Verify backend
# print(skvideo._HAS_FFMPEG)
Reading and Writing (skvideo.io)
Advanced Video I/O
import skvideo.io
# 1. Reading with specific FFmpeg options
input_parameters = {
"-ss": "00:00:10", # Start at 10 seconds
"-t": "5" # Duration 5 seconds
}
video = skvideo.io.vread("video.mp4", inputdict=input_parameters)
# 2. Writing with specific bitrate and codec
output_parameters = {
"-vcodec": "libx264",
"-b:v": "5000k", # 5 Mbps bitrate
"-pix_fmt": "yuv420p"
}
skvideo.io.vwrite("output.mp4", video, outputdict=output_parameters)
Motion Estimation (skvideo.motion)
Calculating Movement
from skvideo.motion import blockMotion
from skvideo.io import vread
# Load two consecutive frames
video = vread("video.mp4")
frame1 = video[0]
frame2 = video[1]
# Block matching algorithm
# Returns motion vectors for each block
motion_vectors = blockMotion(frame1, frame2, method='DS', mbSize=16)
# motion_vectors shape: (H/mbSize, W/mbSize, 2)
# The last dimension contains (dy, dx) offsets
Video Quality Assessment (skvideo.measure)
Measuring Degradation
from skvideo.measure import psnr, ssim, mse
# Compare original and compressed video
original = vread("original.mp4")
distorted = vread("compressed.mp4")
# Calculate metrics frame by frame
psnr_scores = psnr(original, distorted)
ssim_scores = ssim(original, distorted)
print(f"Average PSNR: {np.mean(psnr_scores)}")
Datasets and Utilities
Using Internal Datasets
import skvideo.datasets
# Load a built-in sample video (useful for testing)
path = skvideo.datasets.bigbuckbunny()
reader = skvideo.io.vreader(path)
Practical Workflows
1. Simple Background Subtraction Pipeline
def extract_background(video_path):
"""Calculates the static background of a video using the median."""
reader = skvideo.io.vreader(video_path)
frames = []
# Sample every 10th frame to save memory
for i, frame in enumerate(reader):
if i % 10 == 0:
frames.append(frame)
if len(frames) > 50: break
# Background is the median of frames
background = np.median(np.array(frames), axis=0).astype(np.uint8)
return background
# Usage
# bg = extract_background("security_cam.mp4")
2. Video Stabilization (Frame Alignment)
def stabilize_frames(video_array):
"""Very basic stabilization using motion vectors."""
stabilized = [video_array[0]]
for i in range(1, len(video_array)):
motion = skvideo.motion.blockMotion(video_array[i-1], video_array[i])
avg_motion = np.mean(motion, axis=(0, 1)) # Global drift
# Translate frame back (Simplified logic)
# Use scipy.ndimage.shift for actual translation
...
3. Automated Video Quality Report
def generate_vqa_report(ref_path, test_path):
ref = skvideo.io.vread(ref_path)
test = skvideo.io.vread(test_path)
report = {
"MSE": np.mean(skvideo.measure.mse(ref, test)),
"PSNR": np.mean(skvideo.measure.psnr(ref, test)),
"SSIM": np.mean(skvideo.measure.ssim(ref, test))
}
return report
Performance Optimization
Using vreader with Multi-threading
If you are doing heavy processing on each frame, use a queue-based multi-threading approach to keep the FFmpeg pipe full.
Efficient Slicing
Instead of vread and then slicing, use FFmpeg’s seek and duration flags via inputdict to only read the data you need from the disk.
Common Pitfalls and Solutions
FFmpeg Not Found
Scikit-video relies on the ffmpeg executable.
# â
Solution: Manually set the path if it's not in environment
import skvideo
skvideo.setFFmpegPath("C:/ffmpeg/bin")
Color Space Mismatches
Some videos are stored in YUV422 or YUV444. Scikit-video converts these to RGB by default.
# â Problem: Colors look washed out or incorrect
# â
Solution: Specify the pixel format in inputdict
reader = skvideo.io.vreader("video.mp4", inputdict={"-pix_fmt": "yuv420p"})
Out of Memory (OOM) Errors
Even with vreader, if you store all frames in a list, you will run out of memory.
# â Problem: frames.append(frame) in a loop
# â
Solution: Process and save to disk, or clear the list periodically.
scikit-video brings the power of FFmpeg into the NumPy world. By abstracting the complexities of video containers and providing scientific analysis tools like motion estimation and quality metrics, it is an essential tool for any researcher working with temporal image data.