video-generator

📁 panaversity/agentfactory 📅 1 day ago

总安装量

周安装量

#50215

全站排名

安装命令

npx skills add https://github.com/panaversity/agentfactory --skill video-generator

Agent 安装分布

opencode 3

gemini-cli 3

codebuddy 3

github-copilot 3

codex 3

kimi-cli 3

Skill 文档

Video Generator (Remotion)

Create professional motion graphics videos programmatically with React and Remotion.

Default Workflow (ALWAYS follow this)

Scrape brand data (if featuring a product) using Firecrawl
Create the project in output/<project-name>/
Build all scenes with proper motion graphics
Install dependencies with npm install

Fix package.json scripts to use npx remotion (not bun):

"scripts": {
  "dev": "npx remotion studio",
  "build": "npx remotion bundle"
}

Start Remotion Studio as a background process:
```
cd output/<project-name> && npm run dev
```
Wait for “Server ready” on port 3000.

Expose via Cloudflare tunnel so user can access it:

bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000

Send the user the public URL (e.g. https://xxx.trycloudflare.com)

The user will preview in their browser, request changes, and you edit the source files. Remotion hot-reloads automatically.

Rendering (only when user explicitly asks to export):

cd output/<project-name>
npx remotion render CompositionName out/video.mp4

Quick Start

IMPORTANT: create-video@latest has an interactive CLI that blocks in non-TTY environments. Use manual scaffolding instead:

mkdir -p output/my-video/src/scenes output/my-video/public/audio output/my-video/public/images
cd output/my-video

# Create package.json manually
cat > package.json << 'EOF'
{
  "name": "my-video",
  "scripts": {
    "dev": "npx remotion studio",
    "build": "npx remotion bundle",
    "render": "npx remotion render"
  },
  "dependencies": {
    "@remotion/cli": "4.0.293",
    "react": "^19",
    "react-dom": "^19",
    "remotion": "4.0.293",
    "lucide-react": "^0.400"
  },
  "devDependencies": {
    "@types/react": "^19",
    "typescript": "^5"
  }
}
EOF

# Create tsconfig.json, remotion.config.ts, src/index.ts, src/Root.tsx
# (see File Structure section below for templates)

npm install

# Start dev server
npm run dev

# Expose publicly
bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000

Fetching Brand Data with Firecrawl

MANDATORY: When a video mentions or features any product/company, use Firecrawl to scrape the product’s website for brand data, colors, screenshots, and copy BEFORE designing the video. This ensures visual accuracy and brand consistency.

API Key: Set FIRECRAWL_API_KEY in .env (see TOOLS.md).

Usage

bash scripts/firecrawl.sh "https://example.com"

Returns structured brand data: brandName, tagline, headline, description, features, logoUrl, faviconUrl, primaryColors, ctaText, socialLinks, plus screenshot URL and OG image URL.

Download Assets After Scraping

mkdir -p public/images/brand
curl -s "https://example.com/favicon.svg" -o public/images/brand/logo.svg
curl -s "${OG_IMAGE_URL}" -o public/images/brand/og-image.png
curl -sL "${SCREENSHOT_URL}" -o public/images/brand/screenshot.png

Core Architecture

Scene Management

Use scene-based architecture with proper transitions:

const SCENE_DURATIONS: Record<string, number> = {
  intro: 3000, // 3s hook
  problem: 4000, // 4s dramatic
  solution: 3500, // 3.5s reveal
  features: 5000, // 5s showcase
  cta: 3000, // 3s close
};

Video Structure Pattern

import {
  AbsoluteFill,
  Sequence,
  useCurrentFrame,
  useVideoConfig,
  interpolate,
  spring,
  Img,
  staticFile,
  Audio,
} from "remotion";

export const MyVideo = () => {
  const frame = useCurrentFrame();
  const { fps, durationInFrames } = useVideoConfig();

  return (
    <AbsoluteFill>
      {/* Background music */}
      <Audio src={staticFile("audio/bg-music.mp3")} volume={0.35} />

      {/* Persistent background layer - OUTSIDE sequences */}
      <AnimatedBackground frame={frame} />

      {/* Scene sequences */}
      <Sequence from={0} durationInFrames={90}>
        <IntroScene />
      </Sequence>
      <Sequence from={90} durationInFrames={120}>
        <FeatureScene />
      </Sequence>
    </AbsoluteFill>
  );
};

Motion Graphics Principles

AVOID (Slideshow patterns)

Fading to black between scenes
Centered text on solid backgrounds
Same transition for everything
Linear/robotic animations
Static screens
slideLeft, slideRight, crossDissolve, fadeBlur presets
Emoji icons â NEVER use emoji, always use Lucide React icons

PURSUE (Motion graphics)

Overlapping transitions (next starts BEFORE current ends)
Layered compositions (background/midground/foreground)
Spring physics for organic motion
Varied timing (2-5s scenes, mixed rhythms)
Continuous visual elements across scenes
Custom transitions with clipPath, 3D transforms, morphs
Lucide React for ALL icons (npm install lucide-react) â never emoji

Transition Techniques

Morph/Scale – Element scales up to fill screen, becomes next scene’s background
Wipe – Colored shape sweeps across, revealing next scene
Zoom-through – Camera pushes into element, emerges into new scene
Clip-path reveal – Circle/polygon grows from point to reveal
Persistent anchor – One element stays while surroundings change
Directional flow – Scene 1 exits right, Scene 2 enters from right
Split/unfold – Screen divides, panels slide apart
Perspective flip – Scene rotates on Y-axis in 3D

Animation Timing Reference

// Timing values (in seconds)
const timing = {
  micro: 0.1 - 0.2, // Small shifts, subtle feedback
  snappy: 0.2 - 0.4, // Element entrances, position changes
  standard: 0.5 - 0.8, // Scene transitions, major reveals
  dramatic: 1.0 - 1.5, // Hero moments, cinematic reveals
};

// Spring configs
const springs = {
  snappy: { stiffness: 400, damping: 30 },
  bouncy: { stiffness: 300, damping: 15 },
  smooth: { stiffness: 120, damping: 25 },
};

Visual Style Guidelines

Typography

One display font + one body font max
Massive headlines, tight tracking
Mix weights for hierarchy
Keep text SHORT (viewers can’t pause)

Colors

Use brand colors from Firecrawl scrape as the primary palette â match the product’s actual look
Avoid purple/indigo gradients unless the brand uses them or the user explicitly requests them
Simple, clean backgrounds are generally best â a single dark tone or subtle gradient beats layered textures
Intentional accent colors pulled from the brand

Layout

Use asymmetric layouts, off-center type
Edge-aligned elements create visual tension
Generous whitespace as design element
Use depth sparingly â a subtle backdrop blur or single gradient, not stacked textures

Remotion Essentials

Interpolation

const opacity = interpolate(frame, [0, 30], [0, 1], {
  extrapolateLeft: "clamp",
  extrapolateRight: "clamp",
});

const scale = spring({
  frame,
  fps,
  from: 0.8,
  to: 1,
  durationInFrames: 30,
  config: { damping: 12 },
});

Sequences with Overlap

<Sequence from={0} durationInFrames={100}>
  <Scene1 />
</Sequence>
<Sequence from={80} durationInFrames={100}>
  <Scene2 />
</Sequence>

Cross-Scene Continuity

Place persistent elements OUTSIDE Sequence blocks:

const PersistentShape = ({ currentScene }: { currentScene: number }) => {
  const positions = {
    0: { x: 100, y: 100, scale: 1, opacity: 0.3 },
    1: { x: 800, y: 200, scale: 2, opacity: 0.5 },
    2: { x: 400, y: 600, scale: 0.5, opacity: 1 },
  };

  return (
    <motion.div
      animate={positions[currentScene]}
      transition={{ duration: 0.8, ease: "easeInOut" }}
      className="absolute w-32 h-32 rounded-full bg-gradient-to-r from-coral to-orange"
    />
  );
};

Quality Tests

Before delivering, verify:

Mute test: Story follows visually without sound?
Squint test: Hierarchy visible when squinting?
Timing test: Motion feels natural, not robotic?
Consistency test: Similar elements behave similarly?
Slideshow test: Does NOT look like PowerPoint?
Loop test: Video loops smoothly back to start?

Implementation Steps

Firecrawl brand scrape â If featuring a product, scrape its site first
Director’s treatment â Write vibe, camera style, emotional arc
Visual direction â Colors, fonts, brand feel, animation style
Scene breakdown â List every scene with description, duration, text, transitions
Define durations â Vary pacing (2-3s punchy, 4-5s dramatic)
Create styles.ts â Unified type scale, colors, fonts (prevents sizing inconsistency)
Build persistent layer â Animated background outside scenes
Build scenes â Each with enter/exit animations, 3-5 timed moments
Start Remotion Studio â npm run dev, expose via tunnel, send URL
Iterate with user â Edit source, hot-reload, repeat
Generate voiceover â Gemini TTS (see AI Voiceover section)
Add transcription â Timed captions synced to audio
Sync audio-visual â Match scene timings to voiceover duration
Render â Only when user says to export: npx remotion render CompositionName out/video.mp4

File Structure

my-video/
âââ src/
â   âââ Root.tsx              # Composition definitions (fps, resolution, duration)
â   âââ index.ts              # Entry point (export Root)
â   âââ styles.ts             # Shared colors, fonts, type scale (CRITICAL)
â   âââ MyVideo.tsx           # Main composition (background + sequences + overlay)
â   âââ Transcription.tsx     # Timed caption overlay (if voiceover exists)
â   âââ scenes/               # One file per scene
â       âââ TitleScene.tsx
â       âââ HookScene.tsx
â       âââ FeatureScene.tsx
â       âââ CTAScene.tsx
âââ public/
â   âââ images/
â   â   âââ brand/            # Firecrawl-scraped assets
â   âââ audio/
â       âââ voiceover.wav     # Gemini TTS output
âââ out/                      # Rendered output (gitignored)
â   âââ video.mp4
âââ remotion.config.ts
âââ package.json

Common Components

See references/components.md for reusable:

Animated backgrounds
Terminal windows
Feature cards
Stats displays
CTA buttons
Text reveal animations

AI Voiceover with Gemini TTS

Generate human-quality voiceovers using Google’s Gemini TTS models.

Available TTS Models

Model	Quality	Speed	Use Case
`gemini-2.5-flash-preview-tts`	Good	Fast	Drafts, iteration
`gemini-2.5-pro-preview-tts`	Best	Slower	Final renders

Note: Standard Gemini models (gemini-2.0-flash, etc.) do NOT support audio output. You must use the dedicated -tts models.

Voice Options

Orus, Kore, Puck, Charon, Fenrir, Leda, Aoede, Zephyr â each with distinct personality. Best for professional narration: Orus (warm, authoritative) or Kore (clear, friendly).

Generation Script

# Set API key
export GEMINI_API_KEY="your-key"

# Generate voiceover (returns raw PCM audio)
curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-tts:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "contents": [{"parts": [{"text": "Your narration script here..."}]}],
    "generationConfig": {
      "response_modalities": ["AUDIO"],
      "speech_config": {
        "voiceConfig": {
          "prebuiltVoiceConfig": { "voiceName": "Orus" }
        }
      }
    }
  }' | python3 -c "
import json, sys, base64
r = json.load(sys.stdin)
audio = r['candidates'][0]['content']['parts'][0]['inlineData']['data']
sys.stdout.buffer.write(base64.b64decode(audio))
" > voiceover_raw.pcm

# Convert PCM to WAV (Gemini outputs s16le, 24kHz, mono)
ffmpeg -f s16le -ar 24000 -ac 1 -i voiceover_raw.pcm public/audio/voiceover.wav
rm voiceover_raw.pcm

# Check duration
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 public/audio/voiceover.wav

Audio-Visual Sync

Add a small delay so visuals appear before the voice starts:

const AUDIO_OFFSET = 15; // 0.5s at 30fps

// In the main composition:
<Sequence from={AUDIO_OFFSET}>
  <Audio src={staticFile("audio/voiceover.wav")} volume={0.9} />
</Sequence>;

Calculate total frames from audio duration: totalFrames = Math.ceil(duration * fps) + AUDIO_OFFSET

Transcription Overlay

Add timed captions synced to voiceover:

const LINES: { start: number; end: number; text: string }[] = [
  { start: 20, end: 140, text: "First caption line." },
  { start: 155, end: 230, text: "Second caption line." },
  // ...
];

export const Transcription: React.FC = () => {
  const frame = useCurrentFrame();
  const activeLine = LINES.find((l) => frame >= l.start && frame <= l.end);
  if (!activeLine) return null;

  // CRITICAL: Adaptive fade prevents interpolation errors on short captions
  const dur = activeLine.end - activeLine.start;
  const fade = Math.min(10, Math.floor(dur / 3));

  const opacity = interpolate(
    frame,
    [
      activeLine.start,
      activeLine.start + fade,
      activeLine.end - fade,
      activeLine.end,
    ],
    [0, 1, 1, 0],
    { extrapolateLeft: "clamp", extrapolateRight: "clamp" },
  );

  return (
    <div
      style={{
        position: "absolute",
        bottom: 56,
        left: 0,
        right: 0,
        display: "flex",
        justifyContent: "center",
        opacity,
        pointerEvents: "none",
      }}
    >
      <div
        style={{
          padding: "14px 36px",
          borderRadius: 14,
          background: "rgba(0,0,0,0.6)",
          backdropFilter: "blur(16px)",
        }}
      >
        <span style={{ fontSize: 26, fontWeight: 500, color: "#fff" }}>
          {activeLine.text}
        </span>
      </div>
    </div>
  );
};

Interpolation safety: For short captions (< 20 frames), start + 10 > end - 10 breaks Remotion’s strictly-monotonic requirement. The adaptive Math.min(10, Math.floor(dur / 3)) formula prevents this.

Unified Type Scale

ALWAYS define a shared type scale in styles.ts â never set font sizes inline per scene. This prevents the #1 visual issue: inconsistent text sizing across scenes.

// styles.ts
export const colors = {
  bg: "#0a0a0f",
  textPrimary: "rgba(255,255,255,0.95)",
  textSecondary: "rgba(255,255,255,0.55)",
  textDim: "rgba(255,255,255,0.3)",
  // Brand accent colors from Firecrawl scrape
};

export const fonts = {
  sans: "Inter, system-ui, sans-serif",
  mono: "'JetBrains Mono', monospace",
};

export const type = {
  hero: {
    fontSize: 96,
    fontWeight: 700,
    letterSpacing: "-0.04em",
    lineHeight: 1.05,
  },
  h1: {
    fontSize: 68,
    fontWeight: 700,
    letterSpacing: "-0.035em",
    lineHeight: 1.1,
  },
  h2: {
    fontSize: 48,
    fontWeight: 600,
    letterSpacing: "-0.025em",
    lineHeight: 1.2,
  },
  body: {
    fontSize: 28,
    fontWeight: 400,
    letterSpacing: "-0.01em",
    lineHeight: 1.5,
  },
  bodyLg: {
    fontSize: 34,
    fontWeight: 400,
    letterSpacing: "-0.015em",
    lineHeight: 1.4,
  },
  stat: {
    fontSize: 86,
    fontWeight: 800,
    letterSpacing: "-0.04em",
    lineHeight: 1,
  },
  label: {
    fontSize: 18,
    fontWeight: 600,
    letterSpacing: "0.08em",
    textTransform: "uppercase",
  },
  mono: {
    fontSize: 22,
    fontWeight: 500,
    fontFamily: "'JetBrains Mono', monospace",
  },
};

Every scene imports { colors, fonts, type } from ../styles and uses the shared tokens.

Tunnel Management

# Start tunnel (exposes port 3000 publicly)
bash skills/cloudflare-tunnel/scripts/tunnel.sh start 3000

# Check status
bash skills/cloudflare-tunnel/scripts/tunnel.sh status 3000

# List all tunnels
bash skills/cloudflare-tunnel/scripts/tunnel.sh list

# Stop tunnel
bash skills/cloudflare-tunnel/scripts/tunnel.sh stop 3000

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台