elevenlabs

📁 jakerains/agentskills 📅 7 days ago

总安装量

周安装量

#58809

全站排名

安装命令

npx skills add https://github.com/jakerains/agentskills --skill elevenlabs

Agent 安装分布

opencode 3

gemini-cli 3

claude-code 3

github-copilot 3

codex 3

kimi-cli 3

Skill 文档

ElevenLabs AI Audio Platform

Complete guide to ElevenLabs’ audio AI capabilities: speech synthesis, transcription, voice cloning, sound effects, music generation, dubbing, and conversational voice agents.

Quick Reference

Capability	API/Tool	Use Case
Text-to-Speech	`text_to_speech`	Generate lifelike speech from text
Speech-to-Text	`speech_to_text`	Transcribe audio with Scribe v2
Voice Cloning	`voice_clone`	Clone voices from audio samples
Voice Design	`text_to_voice`	Create voices from text descriptions
Sound Effects	`text_to_sound_effects`	Generate SFX from prompts
Music	`compose_music`	Generate studio-grade music
Dubbing	Dubbing API	Translate video/audio (32 languages)
Voice Changer	`speech_to_speech`	Transform voice while preserving emotion
Voice Isolator	`isolate_audio`	Remove background noise
Voice Agents	Agents CLI/API	Build conversational AI agents

Setup

API Key

# Environment variable
export ELEVENLABS_API_KEY="your-api-key"

# Or in .env file
ELEVENLABS_API_KEY=your-api-key

SDK Installation

# Python
pip install elevenlabs

# TypeScript/Node
npm install elevenlabs

MCP Server (for Claude Code, Cursor, etc.)

{
  "mcpServers": {
    "ElevenLabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key"
      }
    }
  }
}

Text-to-Speech (TTS)

Convert text to lifelike speech. See references/tts-models.md for model details.

Python SDK

from elevenlabs.client import ElevenLabs
from elevenlabs import play

client = ElevenLabs(api_key="your-api-key")

audio = client.text_to_speech.convert(
    text="Hello world!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)
play(audio)

MCP Tool

mcp__ElevenLabs__text_to_speech
- text: "Your text here"
- voice_name: "Rachel" (or voice_id)
- model_id: "eleven_multilingual_v2"
- stability: 0.5, similarity_boost: 0.75
- speed: 1.0 (range: 0.7-1.2)

Model Selection

Model	Latency	Languages	Best For
`eleven_multilingual_v2`	~500ms	29	High quality, long-form
`eleven_flash_v2_5`	~75ms	32	Real-time, agents
`eleven_turbo_v2_5`	~250ms	32	Balanced quality/speed
`eleven_v3` (alpha)	Higher	70+	Emotional, dramatic

Speech-to-Text (Scribe)

Transcribe audio with 90+ language support. See references/stt-scribe.md for details.

Python SDK

result = client.speech_to_text.convert(
    file=open("audio.mp3", "rb"),
    model_id="scribe_v2",
    diarize=True  # Speaker detection
)
print(result.text)

MCP Tool

mcp__ElevenLabs__speech_to_text
- input_file_path: "/path/to/audio.mp3"
- diarize: true (speaker detection)
- language_code: "eng" (or auto-detect)

Features

90+ languages with word-level timestamps
Speaker diarization (up to 48 speakers)
Keyterm prompting (bias toward specific words)
Entity detection (names, numbers, dates)
Realtime mode (~150ms latency)

Voice Cloning

Instant Voice Clone (MCP)

mcp__ElevenLabs__voice_clone
- name: "My Voice"
- files: ["/path/to/sample1.mp3", "/path/to/sample2.mp3"]
- description: "Professional male voice"

Requirements

Instant: 30+ seconds of clean audio
Professional: 30+ minutes for hyper-realistic clones
Creator plan or higher required

Voice Design

Create entirely new voices from text descriptions.

MCP Tool

mcp__ElevenLabs__text_to_voice
- voice_description: "A warm, friendly male voice with a slight British accent,
  perfect for audiobook narration"

Creates 3 voice previews to choose from. Use create_voice_from_preview to save.

Sound Effects

Generate cinematic sound effects from text. See references/sound-effects.md.

MCP Tool

mcp__ElevenLabs__text_to_sound_effects
- text: "Heavy wooden door creaking open slowly"
- duration_seconds: 3.0 (0.5-30 seconds)
- loop: false

Prompting Tips

Simple: “Glass shattering on concrete”
Sequences: “Footsteps on gravel, then a metallic door opens”
Musical: “90s hip-hop drum loop, 90 BPM”

Music Generation

Generate studio-grade music. See references/music-generation.md.

MCP Tool

mcp__ElevenLabs__compose_music
- prompt: "Upbeat electronic track with driving synths, 120 BPM"
- music_length_ms: 60000 (10s-5min)

Features

Complete control over genre, style, structure
Vocals or instrumental
Multilingual lyrics
Edit sections individually

Dubbing

Translate audio/video while preserving speaker identity. See references/dubbing.md.

32 languages supported
Preserves emotion, timing, tone
Speaker separation (up to 9 speakers)
Files up to 1GB / 2.5 hours via API

Voice Changer (Speech-to-Speech)

Transform any voice while preserving performance nuances.

MCP Tool

mcp__ElevenLabs__speech_to_speech
- input_file_path: "/path/to/recording.mp3"
- voice_id: "target_voice_id"

Preserves whispers, laughs, emotional cues
29 languages supported
Billed at 1000 chars/minute

Voice Isolator

Remove background noise from recordings.

MCP Tool

mcp__ElevenLabs__isolate_audio
- input_file_path: "/path/to/noisy_audio.mp3"

Supports audio and video files
Files up to 500MB / 1 hour

Conversational Voice Agents

Build and deploy voice-enabled AI agents. See references/voice-agents.md for comprehensive guide.

CLI Quick Start

# Install
npm install -g @elevenlabs/cli

# Initialize and authenticate
elevenlabs agents init
elevenlabs auth login

# Create agent
elevenlabs agents add "Support Bot" --template customer-service

# Deploy
elevenlabs agents push

Templates

Template	Use Case
`customer-service`	Professional support, low temp
`assistant`	General purpose, balanced
`voice-only`	Voice interactions only
`text-only`	Text conversations only
`minimal`	Quick prototyping

Agent Tools

Server Tools: Webhook API calls
Client Tools: Frontend events
MCP Tools: Model Context Protocol servers
System Tools: transfer_to_number, agent_transfer, end_call

Voice Library

Search Voices (MCP)

mcp__ElevenLabs__search_voices
- search: "professional narrator"
- sort: "name" | "created_at_unix"

Search Public Library

mcp__ElevenLabs__search_voice_library
- search: "deep male"
- page_size: 10

Popular Voice IDs

Voice	ID	Style
Rachel	21m00Tcm4TlvDq8ikWAM	Neutral, professional
Adam	pNInz6obpgDQGcFmaJgB	Deep, warm
Bella	EXAVITQu4vr4xnSDxMaL	Soft, gentle

Browse: elevenlabs.io/voice-library

Account & Billing

Check Subscription

mcp__ElevenLabs__check_subscription

List Models

mcp__ElevenLabs__list_models

Reference Documentation

Topic	File
TTS Models & Parameters	references/tts-models.md
Speech-to-Text (Scribe)	references/stt-scribe.md
Sound Effects Prompting	references/sound-effects.md
Music Generation	references/music-generation.md
Voice Agents (CLI/API)	references/voice-agents.md
Agent Prompting Guide	references/agent-prompting.md
Dubbing Guide	references/dubbing.md

Pricing & Limits

TTS: Per character (Flash models 50% cheaper)
STT: Per hour of audio
Sound Effects: 40 credits/second when duration specified
Music: Per generation
See: elevenlabs.io/pricing

Concurrency Limits (by plan)

Plan	Multilingual v2	Flash/Turbo	STT
Free	2	4	8
Starter	3	6	12
Creator	5	10	20
Pro	10	20	40
Scale	15	30	60

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台