elevenlabs
npx skills add https://github.com/jakerains/agentskills --skill elevenlabs
Agent 安装分布
Skill 文档
ElevenLabs AI Audio Platform
Complete guide to ElevenLabs’ audio AI capabilities: speech synthesis, transcription, voice cloning, sound effects, music generation, dubbing, and conversational voice agents.
Quick Reference
| Capability | API/Tool | Use Case |
|---|---|---|
| Text-to-Speech | text_to_speech |
Generate lifelike speech from text |
| Speech-to-Text | speech_to_text |
Transcribe audio with Scribe v2 |
| Voice Cloning | voice_clone |
Clone voices from audio samples |
| Voice Design | text_to_voice |
Create voices from text descriptions |
| Sound Effects | text_to_sound_effects |
Generate SFX from prompts |
| Music | compose_music |
Generate studio-grade music |
| Dubbing | Dubbing API | Translate video/audio (32 languages) |
| Voice Changer | speech_to_speech |
Transform voice while preserving emotion |
| Voice Isolator | isolate_audio |
Remove background noise |
| Voice Agents | Agents CLI/API | Build conversational AI agents |
Setup
API Key
# Environment variable
export ELEVENLABS_API_KEY="your-api-key"
# Or in .env file
ELEVENLABS_API_KEY=your-api-key
SDK Installation
# Python
pip install elevenlabs
# TypeScript/Node
npm install elevenlabs
MCP Server (for Claude Code, Cursor, etc.)
{
"mcpServers": {
"ElevenLabs": {
"command": "uvx",
"args": ["elevenlabs-mcp"],
"env": {
"ELEVENLABS_API_KEY": "your-api-key"
}
}
}
}
Text-to-Speech (TTS)
Convert text to lifelike speech. See references/tts-models.md for model details.
Python SDK
from elevenlabs.client import ElevenLabs
from elevenlabs import play
client = ElevenLabs(api_key="your-api-key")
audio = client.text_to_speech.convert(
text="Hello world!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128"
)
play(audio)
MCP Tool
mcp__ElevenLabs__text_to_speech
- text: "Your text here"
- voice_name: "Rachel" (or voice_id)
- model_id: "eleven_multilingual_v2"
- stability: 0.5, similarity_boost: 0.75
- speed: 1.0 (range: 0.7-1.2)
Model Selection
| Model | Latency | Languages | Best For |
|---|---|---|---|
eleven_multilingual_v2 |
~500ms | 29 | High quality, long-form |
eleven_flash_v2_5 |
~75ms | 32 | Real-time, agents |
eleven_turbo_v2_5 |
~250ms | 32 | Balanced quality/speed |
eleven_v3 (alpha) |
Higher | 70+ | Emotional, dramatic |
Speech-to-Text (Scribe)
Transcribe audio with 90+ language support. See references/stt-scribe.md for details.
Python SDK
result = client.speech_to_text.convert(
file=open("audio.mp3", "rb"),
model_id="scribe_v2",
diarize=True # Speaker detection
)
print(result.text)
MCP Tool
mcp__ElevenLabs__speech_to_text
- input_file_path: "/path/to/audio.mp3"
- diarize: true (speaker detection)
- language_code: "eng" (or auto-detect)
Features
- 90+ languages with word-level timestamps
- Speaker diarization (up to 48 speakers)
- Keyterm prompting (bias toward specific words)
- Entity detection (names, numbers, dates)
- Realtime mode (~150ms latency)
Voice Cloning
Instant Voice Clone (MCP)
mcp__ElevenLabs__voice_clone
- name: "My Voice"
- files: ["/path/to/sample1.mp3", "/path/to/sample2.mp3"]
- description: "Professional male voice"
Requirements
- Instant: 30+ seconds of clean audio
- Professional: 30+ minutes for hyper-realistic clones
- Creator plan or higher required
Voice Design
Create entirely new voices from text descriptions.
MCP Tool
mcp__ElevenLabs__text_to_voice
- voice_description: "A warm, friendly male voice with a slight British accent,
perfect for audiobook narration"
Creates 3 voice previews to choose from. Use create_voice_from_preview to save.
Sound Effects
Generate cinematic sound effects from text. See references/sound-effects.md.
MCP Tool
mcp__ElevenLabs__text_to_sound_effects
- text: "Heavy wooden door creaking open slowly"
- duration_seconds: 3.0 (0.5-30 seconds)
- loop: false
Prompting Tips
- Simple: “Glass shattering on concrete”
- Sequences: “Footsteps on gravel, then a metallic door opens”
- Musical: “90s hip-hop drum loop, 90 BPM”
Music Generation
Generate studio-grade music. See references/music-generation.md.
MCP Tool
mcp__ElevenLabs__compose_music
- prompt: "Upbeat electronic track with driving synths, 120 BPM"
- music_length_ms: 60000 (10s-5min)
Features
- Complete control over genre, style, structure
- Vocals or instrumental
- Multilingual lyrics
- Edit sections individually
Dubbing
Translate audio/video while preserving speaker identity. See references/dubbing.md.
- 32 languages supported
- Preserves emotion, timing, tone
- Speaker separation (up to 9 speakers)
- Files up to 1GB / 2.5 hours via API
Voice Changer (Speech-to-Speech)
Transform any voice while preserving performance nuances.
MCP Tool
mcp__ElevenLabs__speech_to_speech
- input_file_path: "/path/to/recording.mp3"
- voice_id: "target_voice_id"
- Preserves whispers, laughs, emotional cues
- 29 languages supported
- Billed at 1000 chars/minute
Voice Isolator
Remove background noise from recordings.
MCP Tool
mcp__ElevenLabs__isolate_audio
- input_file_path: "/path/to/noisy_audio.mp3"
- Supports audio and video files
- Files up to 500MB / 1 hour
Conversational Voice Agents
Build and deploy voice-enabled AI agents. See references/voice-agents.md for comprehensive guide.
CLI Quick Start
# Install
npm install -g @elevenlabs/cli
# Initialize and authenticate
elevenlabs agents init
elevenlabs auth login
# Create agent
elevenlabs agents add "Support Bot" --template customer-service
# Deploy
elevenlabs agents push
Templates
| Template | Use Case |
|---|---|
customer-service |
Professional support, low temp |
assistant |
General purpose, balanced |
voice-only |
Voice interactions only |
text-only |
Text conversations only |
minimal |
Quick prototyping |
Agent Tools
- Server Tools: Webhook API calls
- Client Tools: Frontend events
- MCP Tools: Model Context Protocol servers
- System Tools: transfer_to_number, agent_transfer, end_call
Voice Library
Search Voices (MCP)
mcp__ElevenLabs__search_voices
- search: "professional narrator"
- sort: "name" | "created_at_unix"
Search Public Library
mcp__ElevenLabs__search_voice_library
- search: "deep male"
- page_size: 10
Popular Voice IDs
| Voice | ID | Style |
|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Neutral, professional |
| Adam | pNInz6obpgDQGcFmaJgB | Deep, warm |
| Bella | EXAVITQu4vr4xnSDxMaL | Soft, gentle |
Browse: elevenlabs.io/voice-library
Account & Billing
Check Subscription
mcp__ElevenLabs__check_subscription
List Models
mcp__ElevenLabs__list_models
Reference Documentation
| Topic | File |
|---|---|
| TTS Models & Parameters | references/tts-models.md |
| Speech-to-Text (Scribe) | references/stt-scribe.md |
| Sound Effects Prompting | references/sound-effects.md |
| Music Generation | references/music-generation.md |
| Voice Agents (CLI/API) | references/voice-agents.md |
| Agent Prompting Guide | references/agent-prompting.md |
| Dubbing Guide | references/dubbing.md |
Pricing & Limits
- TTS: Per character (Flash models 50% cheaper)
- STT: Per hour of audio
- Sound Effects: 40 credits/second when duration specified
- Music: Per generation
- See: elevenlabs.io/pricing
Concurrency Limits (by plan)
| Plan | Multilingual v2 | Flash/Turbo | STT |
|---|---|---|---|
| Free | 2 | 4 | 8 |
| Starter | 3 | 6 | 12 |
| Creator | 5 | 10 | 20 |
| Pro | 10 | 20 | 40 |
| Scale | 15 | 30 | 60 |