video-generator
npx skills add https://github.com/panaversity/agentfactory --skill video-generator
Agent 安装分布
Skill 文档
Video Generator (Remotion)
Create professional motion graphics videos programmatically with React and Remotion.
Default Workflow (ALWAYS follow this)
- Scrape brand data (if featuring a product) using Firecrawl
- Create the project in
output/<project-name>/ - Build all scenes with proper motion graphics
- Install dependencies with
npm install - Fix package.json scripts to use
npx remotion(notbun):"scripts": { "dev": "npx remotion studio", "build": "npx remotion bundle" } - Start Remotion Studio as a background process:
Wait for “Server ready” on port 3000.cd output/<project-name> && npm run dev - Expose via Cloudflare tunnel so user can access it:
bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000 - Send the user the public URL (e.g.
https://xxx.trycloudflare.com)
The user will preview in their browser, request changes, and you edit the source files. Remotion hot-reloads automatically.
Rendering (only when user explicitly asks to export):
cd output/<project-name>
npx remotion render CompositionName out/video.mp4
Quick Start
IMPORTANT: create-video@latest has an interactive CLI that blocks in non-TTY environments. Use manual scaffolding instead:
mkdir -p output/my-video/src/scenes output/my-video/public/audio output/my-video/public/images
cd output/my-video
# Create package.json manually
cat > package.json << 'EOF'
{
"name": "my-video",
"scripts": {
"dev": "npx remotion studio",
"build": "npx remotion bundle",
"render": "npx remotion render"
},
"dependencies": {
"@remotion/cli": "4.0.293",
"react": "^19",
"react-dom": "^19",
"remotion": "4.0.293",
"lucide-react": "^0.400"
},
"devDependencies": {
"@types/react": "^19",
"typescript": "^5"
}
}
EOF
# Create tsconfig.json, remotion.config.ts, src/index.ts, src/Root.tsx
# (see File Structure section below for templates)
npm install
# Start dev server
npm run dev
# Expose publicly
bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000
Fetching Brand Data with Firecrawl
MANDATORY: When a video mentions or features any product/company, use Firecrawl to scrape the product’s website for brand data, colors, screenshots, and copy BEFORE designing the video. This ensures visual accuracy and brand consistency.
API Key: Set FIRECRAWL_API_KEY in .env (see TOOLS.md).
Usage
bash scripts/firecrawl.sh "https://example.com"
Returns structured brand data: brandName, tagline, headline, description, features, logoUrl, faviconUrl, primaryColors, ctaText, socialLinks, plus screenshot URL and OG image URL.
Download Assets After Scraping
mkdir -p public/images/brand
curl -s "https://example.com/favicon.svg" -o public/images/brand/logo.svg
curl -s "${OG_IMAGE_URL}" -o public/images/brand/og-image.png
curl -sL "${SCREENSHOT_URL}" -o public/images/brand/screenshot.png
Core Architecture
Scene Management
Use scene-based architecture with proper transitions:
const SCENE_DURATIONS: Record<string, number> = {
intro: 3000, // 3s hook
problem: 4000, // 4s dramatic
solution: 3500, // 3.5s reveal
features: 5000, // 5s showcase
cta: 3000, // 3s close
};
Video Structure Pattern
import {
AbsoluteFill,
Sequence,
useCurrentFrame,
useVideoConfig,
interpolate,
spring,
Img,
staticFile,
Audio,
} from "remotion";
export const MyVideo = () => {
const frame = useCurrentFrame();
const { fps, durationInFrames } = useVideoConfig();
return (
<AbsoluteFill>
{/* Background music */}
<Audio src={staticFile("audio/bg-music.mp3")} volume={0.35} />
{/* Persistent background layer - OUTSIDE sequences */}
<AnimatedBackground frame={frame} />
{/* Scene sequences */}
<Sequence from={0} durationInFrames={90}>
<IntroScene />
</Sequence>
<Sequence from={90} durationInFrames={120}>
<FeatureScene />
</Sequence>
</AbsoluteFill>
);
};
Motion Graphics Principles
AVOID (Slideshow patterns)
- Fading to black between scenes
- Centered text on solid backgrounds
- Same transition for everything
- Linear/robotic animations
- Static screens
slideLeft,slideRight,crossDissolve,fadeBlurpresets- Emoji icons â NEVER use emoji, always use Lucide React icons
PURSUE (Motion graphics)
- Overlapping transitions (next starts BEFORE current ends)
- Layered compositions (background/midground/foreground)
- Spring physics for organic motion
- Varied timing (2-5s scenes, mixed rhythms)
- Continuous visual elements across scenes
- Custom transitions with clipPath, 3D transforms, morphs
- Lucide React for ALL icons (
npm install lucide-react) â never emoji
Transition Techniques
- Morph/Scale – Element scales up to fill screen, becomes next scene’s background
- Wipe – Colored shape sweeps across, revealing next scene
- Zoom-through – Camera pushes into element, emerges into new scene
- Clip-path reveal – Circle/polygon grows from point to reveal
- Persistent anchor – One element stays while surroundings change
- Directional flow – Scene 1 exits right, Scene 2 enters from right
- Split/unfold – Screen divides, panels slide apart
- Perspective flip – Scene rotates on Y-axis in 3D
Animation Timing Reference
// Timing values (in seconds)
const timing = {
micro: 0.1 - 0.2, // Small shifts, subtle feedback
snappy: 0.2 - 0.4, // Element entrances, position changes
standard: 0.5 - 0.8, // Scene transitions, major reveals
dramatic: 1.0 - 1.5, // Hero moments, cinematic reveals
};
// Spring configs
const springs = {
snappy: { stiffness: 400, damping: 30 },
bouncy: { stiffness: 300, damping: 15 },
smooth: { stiffness: 120, damping: 25 },
};
Visual Style Guidelines
Typography
- One display font + one body font max
- Massive headlines, tight tracking
- Mix weights for hierarchy
- Keep text SHORT (viewers can’t pause)
Colors
- Use brand colors from Firecrawl scrape as the primary palette â match the product’s actual look
- Avoid purple/indigo gradients unless the brand uses them or the user explicitly requests them
- Simple, clean backgrounds are generally best â a single dark tone or subtle gradient beats layered textures
- Intentional accent colors pulled from the brand
Layout
- Use asymmetric layouts, off-center type
- Edge-aligned elements create visual tension
- Generous whitespace as design element
- Use depth sparingly â a subtle backdrop blur or single gradient, not stacked textures
Remotion Essentials
Interpolation
const opacity = interpolate(frame, [0, 30], [0, 1], {
extrapolateLeft: "clamp",
extrapolateRight: "clamp",
});
const scale = spring({
frame,
fps,
from: 0.8,
to: 1,
durationInFrames: 30,
config: { damping: 12 },
});
Sequences with Overlap
<Sequence from={0} durationInFrames={100}>
<Scene1 />
</Sequence>
<Sequence from={80} durationInFrames={100}>
<Scene2 />
</Sequence>
Cross-Scene Continuity
Place persistent elements OUTSIDE Sequence blocks:
const PersistentShape = ({ currentScene }: { currentScene: number }) => {
const positions = {
0: { x: 100, y: 100, scale: 1, opacity: 0.3 },
1: { x: 800, y: 200, scale: 2, opacity: 0.5 },
2: { x: 400, y: 600, scale: 0.5, opacity: 1 },
};
return (
<motion.div
animate={positions[currentScene]}
transition={{ duration: 0.8, ease: "easeInOut" }}
className="absolute w-32 h-32 rounded-full bg-gradient-to-r from-coral to-orange"
/>
);
};
Quality Tests
Before delivering, verify:
- Mute test: Story follows visually without sound?
- Squint test: Hierarchy visible when squinting?
- Timing test: Motion feels natural, not robotic?
- Consistency test: Similar elements behave similarly?
- Slideshow test: Does NOT look like PowerPoint?
- Loop test: Video loops smoothly back to start?
Implementation Steps
- Firecrawl brand scrape â If featuring a product, scrape its site first
- Director’s treatment â Write vibe, camera style, emotional arc
- Visual direction â Colors, fonts, brand feel, animation style
- Scene breakdown â List every scene with description, duration, text, transitions
- Define durations â Vary pacing (2-3s punchy, 4-5s dramatic)
- Create styles.ts â Unified type scale, colors, fonts (prevents sizing inconsistency)
- Build persistent layer â Animated background outside scenes
- Build scenes â Each with enter/exit animations, 3-5 timed moments
- Start Remotion Studio â
npm run dev, expose via tunnel, send URL - Iterate with user â Edit source, hot-reload, repeat
- Generate voiceover â Gemini TTS (see AI Voiceover section)
- Add transcription â Timed captions synced to audio
- Sync audio-visual â Match scene timings to voiceover duration
- Render â Only when user says to export:
npx remotion render CompositionName out/video.mp4
File Structure
my-video/
âââ src/
â âââ Root.tsx # Composition definitions (fps, resolution, duration)
â âââ index.ts # Entry point (export Root)
â âââ styles.ts # Shared colors, fonts, type scale (CRITICAL)
â âââ MyVideo.tsx # Main composition (background + sequences + overlay)
â âââ Transcription.tsx # Timed caption overlay (if voiceover exists)
â âââ scenes/ # One file per scene
â âââ TitleScene.tsx
â âââ HookScene.tsx
â âââ FeatureScene.tsx
â âââ CTAScene.tsx
âââ public/
â âââ images/
â â âââ brand/ # Firecrawl-scraped assets
â âââ audio/
â âââ voiceover.wav # Gemini TTS output
âââ out/ # Rendered output (gitignored)
â âââ video.mp4
âââ remotion.config.ts
âââ package.json
Common Components
See references/components.md for reusable:
- Animated backgrounds
- Terminal windows
- Feature cards
- Stats displays
- CTA buttons
- Text reveal animations
AI Voiceover with Gemini TTS
Generate human-quality voiceovers using Google’s Gemini TTS models.
Available TTS Models
| Model | Quality | Speed | Use Case |
|---|---|---|---|
gemini-2.5-flash-preview-tts |
Good | Fast | Drafts, iteration |
gemini-2.5-pro-preview-tts |
Best | Slower | Final renders |
Note: Standard Gemini models (gemini-2.0-flash, etc.) do NOT support audio output. You must use the dedicated -tts models.
Voice Options
Orus, Kore, Puck, Charon, Fenrir, Leda, Aoede, Zephyr â each with distinct personality. Best for professional narration: Orus (warm, authoritative) or Kore (clear, friendly).
Generation Script
# Set API key
export GEMINI_API_KEY="your-key"
# Generate voiceover (returns raw PCM audio)
curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-tts:generateContent?key=$GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"contents": [{"parts": [{"text": "Your narration script here..."}]}],
"generationConfig": {
"response_modalities": ["AUDIO"],
"speech_config": {
"voiceConfig": {
"prebuiltVoiceConfig": { "voiceName": "Orus" }
}
}
}
}' | python3 -c "
import json, sys, base64
r = json.load(sys.stdin)
audio = r['candidates'][0]['content']['parts'][0]['inlineData']['data']
sys.stdout.buffer.write(base64.b64decode(audio))
" > voiceover_raw.pcm
# Convert PCM to WAV (Gemini outputs s16le, 24kHz, mono)
ffmpeg -f s16le -ar 24000 -ac 1 -i voiceover_raw.pcm public/audio/voiceover.wav
rm voiceover_raw.pcm
# Check duration
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 public/audio/voiceover.wav
Audio-Visual Sync
Add a small delay so visuals appear before the voice starts:
const AUDIO_OFFSET = 15; // 0.5s at 30fps
// In the main composition:
<Sequence from={AUDIO_OFFSET}>
<Audio src={staticFile("audio/voiceover.wav")} volume={0.9} />
</Sequence>;
Calculate total frames from audio duration: totalFrames = Math.ceil(duration * fps) + AUDIO_OFFSET
Transcription Overlay
Add timed captions synced to voiceover:
const LINES: { start: number; end: number; text: string }[] = [
{ start: 20, end: 140, text: "First caption line." },
{ start: 155, end: 230, text: "Second caption line." },
// ...
];
export const Transcription: React.FC = () => {
const frame = useCurrentFrame();
const activeLine = LINES.find((l) => frame >= l.start && frame <= l.end);
if (!activeLine) return null;
// CRITICAL: Adaptive fade prevents interpolation errors on short captions
const dur = activeLine.end - activeLine.start;
const fade = Math.min(10, Math.floor(dur / 3));
const opacity = interpolate(
frame,
[
activeLine.start,
activeLine.start + fade,
activeLine.end - fade,
activeLine.end,
],
[0, 1, 1, 0],
{ extrapolateLeft: "clamp", extrapolateRight: "clamp" },
);
return (
<div
style={{
position: "absolute",
bottom: 56,
left: 0,
right: 0,
display: "flex",
justifyContent: "center",
opacity,
pointerEvents: "none",
}}
>
<div
style={{
padding: "14px 36px",
borderRadius: 14,
background: "rgba(0,0,0,0.6)",
backdropFilter: "blur(16px)",
}}
>
<span style={{ fontSize: 26, fontWeight: 500, color: "#fff" }}>
{activeLine.text}
</span>
</div>
</div>
);
};
Interpolation safety: For short captions (< 20 frames), start + 10 > end - 10 breaks Remotion’s strictly-monotonic requirement. The adaptive Math.min(10, Math.floor(dur / 3)) formula prevents this.
Unified Type Scale
ALWAYS define a shared type scale in styles.ts â never set font sizes inline per scene. This prevents the #1 visual issue: inconsistent text sizing across scenes.
// styles.ts
export const colors = {
bg: "#0a0a0f",
textPrimary: "rgba(255,255,255,0.95)",
textSecondary: "rgba(255,255,255,0.55)",
textDim: "rgba(255,255,255,0.3)",
// Brand accent colors from Firecrawl scrape
};
export const fonts = {
sans: "Inter, system-ui, sans-serif",
mono: "'JetBrains Mono', monospace",
};
export const type = {
hero: {
fontSize: 96,
fontWeight: 700,
letterSpacing: "-0.04em",
lineHeight: 1.05,
},
h1: {
fontSize: 68,
fontWeight: 700,
letterSpacing: "-0.035em",
lineHeight: 1.1,
},
h2: {
fontSize: 48,
fontWeight: 600,
letterSpacing: "-0.025em",
lineHeight: 1.2,
},
body: {
fontSize: 28,
fontWeight: 400,
letterSpacing: "-0.01em",
lineHeight: 1.5,
},
bodyLg: {
fontSize: 34,
fontWeight: 400,
letterSpacing: "-0.015em",
lineHeight: 1.4,
},
stat: {
fontSize: 86,
fontWeight: 800,
letterSpacing: "-0.04em",
lineHeight: 1,
},
label: {
fontSize: 18,
fontWeight: 600,
letterSpacing: "0.08em",
textTransform: "uppercase",
},
mono: {
fontSize: 22,
fontWeight: 500,
fontFamily: "'JetBrains Mono', monospace",
},
};
Every scene imports { colors, fonts, type } from ../styles and uses the shared tokens.
Tunnel Management
# Start tunnel (exposes port 3000 publicly)
bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000
# Check status
bash skills/cloudflare-tunnel/scripts/tunnel.sh status 3000
# List all tunnels
bash skills/cloudflare-tunnel/scripts/tunnel.sh list
# Stop tunnel
bash skills/cloudflare-tunnel/scripts/tunnel.sh stop 3000