agent-media

📁 agntswrm/agent-media 📅 Jan 20, 2026
19
总安装量
11
周安装量
#18780
全站排名
安装命令
npx skills add https://github.com/agntswrm/agent-media --skill agent-media

Agent 安装分布

claude-code 10
opencode 9
gemini-cli 8
codex 8
windsurf 7
cursor 7

Skill 文档

Agent Media

Agent Media is an agent-first media toolkit that provides CLI-accessible commands for image, video, and audio processing. All commands produce deterministic, machine-readable JSON output.

Available Commands

Image Commands

  • agent-media image resize – Resize an image
  • agent-media image convert – Convert image format
  • agent-media image remove-background – Remove image background
  • agent-media image generate – Generate image from text

Audio Commands

  • agent-media audio extract – Extract audio from video
  • agent-media audio transcribe – Transcribe audio to text

Video Commands

  • agent-media video generate – Generate video from text or image

Output Format

All commands return JSON to stdout:

{
  "ok": true,
  "media_type": "image",
  "action": "resize",
  "provider": "local",
  "output_path": "output_123.webp",
  "mime": "image/webp",
  "bytes": 12345
}

On error:

{
  "ok": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "input file not found"
  }
}

Providers

  • local – Default provider using Sharp (resize, convert) and Transformers.js (remove-background, transcribe)
  • fal – fal.ai provider (generate, edit, remove-background, transcribe, video)
  • replicate – Replicate API (generate, edit, remove-background, transcribe, video)
  • runpod – Runpod API (generate, edit)
  • ai-gateway – Vercel AI Gateway (generate, edit)

Provider Selection

  1. Explicit: --provider <name>
  2. Auto-detect from environment variables
  3. Fallback to local provider

Environment Variables

  • AGENT_MEDIA_DIR – Custom output directory
  • FAL_API_KEY – Enable fal provider
  • REPLICATE_API_TOKEN – Enable replicate provider
  • RUNPOD_API_KEY – Enable runpod provider
  • AI_GATEWAY_API_KEY – Enable ai-gateway provider