voicemode
npx skills add https://github.com/mbailey/voicemode --skill voicemode
Agent 安装分布
Skill 文档
First-Time Setup
If VoiceMode isn’t working or MCP fails to connect, run:
/voicemode:install
After install, reconnect MCP: /mcp â select voicemode â “Reconnect” (or restart Claude Code).
VoiceMode
Natural voice conversations with Claude Code using speech-to-text (STT) and text-to-speech (TTS).
Note: The Python package is voice-mode (hyphen), but the CLI command is voicemode (no hyphen).
When to Use MCP vs CLI
| Task | Use | Why |
|---|---|---|
| Voice conversations | MCP voicemode:converse |
Faster – server already running |
| Service start/stop | MCP voicemode:service |
Works within Claude Code |
| Installation | CLI voice-mode-install |
One-time setup |
| Configuration | CLI voicemode config |
Edit settings directly |
| Diagnostics | CLI voicemode diag |
Administrative tasks |
Usage
Use the converse MCP tool to speak to users and hear their responses:
# Speak and listen for response (most common usage)
voicemode:converse("Hello! What would you like to work on?")
# Speak without waiting (for narration while working)
voicemode:converse("Searching the codebase now...", wait_for_response=False)
For most conversations, just pass your message – defaults handle everything else.
Use default converse tool parameters unless there’s a good reason not to. Timing parameters (listen_duration_max, listen_duration_min) use smart defaults with silence detection – don’t override unless the user requests it or you see a clear need. Defaults are configurable by the user via ~/.voicemode/voicemode.env.
| Parameter | Default | Description |
|---|---|---|
message |
required | Text to speak |
wait_for_response |
true | Listen after speaking |
voice |
auto | TTS voice |
For all parameters, see Converse Parameters.
Best Practices
- Narrate without waiting – Use
wait_for_response=Falsewhen announcing actions - One question at a time – Don’t bundle multiple questions in voice mode
- Check status first – Verify services are running before starting conversations
- Let VoiceMode auto-select – Don’t hardcode providers unless user has preference
- First run is slow – Model downloads happen on first start (2-5 min), then instant
Handling Pauses and Wait Requests
When the user asks you to wait or give them time:
Short pauses (up to 60 seconds): If the user says something ending with “wait” (e.g., “hang on”, “give me a sec”, “wait”), VoiceMode automatically pauses for 60 seconds then resumes listening. This is built-in.
Longer pauses (2+ minutes): Use bash sleep N where N is seconds. For example, if the user says “give me 5 minutes”:
sleep 300 # Wait 5 minutes
Then call converse again when the wait is over:
voicemode:converse("Five minutes is up. Ready when you are.")
Configuration: The short pause duration is configurable via VOICEMODE_WAIT_DURATION (default: 60 seconds).
STT Recovery – Manual Transcription
If Whisper STT fails but the audio was recorded successfully, you can manually transcribe the saved audio file:
# Transcribe the most recent recording
whisper-cli ~/.voicemode/audio/latest-STT.wav
# Or check if file exists first (safe for inclusion in automation)
if [ -f ~/.voicemode/audio/latest-STT.wav ]; then
whisper-cli ~/.voicemode/audio/latest-STT.wav
fi
Requirements:
- Audio saving must be enabled via one of:
VOICEMODE_SAVE_AUDIO=truein~/.voicemode/voicemode.envVOICEMODE_SAVE_ALL=true(saves all audio and transcriptions)VOICEMODE_DEBUG=true(enables debug mode with audio saving)
How it works:
- VoiceMode saves all STT recordings to
~/.voicemode/audio/with timestamps - The
latest-STT.wavsymlink always points to the most recent recording - If the STT API fails, the recording is still saved for manual recovery
- This lets you recover the user’s speech without asking them to repeat
When to use:
- STT service timeout or connection failure
- Transcription returned empty but user definitely spoke
- Need to verify what was actually said vs. what was transcribed
See also: Troubleshooting – No Speech Detected
Check Status
voicemode service status # All services
voicemode service status whisper # Specific service
Shows service status including running state, ports, and health.
Installation
# Install VoiceMode CLI and configure services
uvx voice-mode-install --yes
# Install local services (Apple Silicon recommended)
voicemode service install whisper
voicemode service install kokoro
See Getting Started for detailed steps.
Service Management
# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
| Service | Port | Purpose |
|---|---|---|
| whisper | 2022 | Speech-to-text |
| kokoro | 8880 | Text-to-speech |
| voicemode | 8765 | HTTP/SSE server |
Actions: status, start, stop, restart, logs, enable, disable
Configuration
voicemode config list # Show all settings
voicemode config set VOICEMODE_TTS_VOICE nova # Set default voice
voicemode config edit # Edit config file
Config file: ~/.voicemode/voicemode.env
See Configuration Guide for all options.
DJ Mode
Background music during VoiceMode sessions with track-level control.
# Core playback
voicemode dj play /path/to/music.mp3 # Play a file or URL
voicemode dj status # What's playing
voicemode dj pause # Pause playback
voicemode dj resume # Resume playback
voicemode dj stop # Stop playback
# Navigation and volume
voicemode dj next # Skip to next chapter
voicemode dj prev # Go to previous chapter
voicemode dj volume 30 # Set volume to 30%
# Music For Programming
voicemode dj mfp list # List available episodes
voicemode dj mfp play 49 # Play episode 49
voicemode dj mfp sync # Convert CUE files to chapters
# Music library
voicemode dj find "daft punk" # Search library
voicemode dj library scan # Index ~/Audio/music
voicemode dj library stats # Show library info
# Play history and favorites
voicemode dj history # Show recent plays
voicemode dj favorite # Toggle favorite on current track
Configuration: Set VOICEMODE_DJ_VOLUME in ~/.voicemode/voicemode.env to customize startup volume (default: 50%).
CLI Cheat Sheet
# Service management
voicemode service status # All services
voicemode service start whisper # Start a service
voicemode service logs kokoro # View logs
# Diagnostics
voicemode deps # Check dependencies
voicemode diag info # System info
voicemode diag devices # Audio devices
# DJ Mode
voicemode dj play <file|url> # Start playback
voicemode dj status # What's playing
voicemode dj next/prev # Navigate chapters
voicemode dj stop # Stop playback
voicemode dj mfp play 49 # Music For Programming
Voice Handoff Between Agents
Transfer voice conversations between Claude Code agents for multi-agent workflows.
Use cases:
- Personal assistant routing to project-specific foremen
- Foremen delegating to workers for focused tasks
- Returning control when work is complete
Quick Reference
# 1. Announce the transfer
voicemode:converse("Transferring you to a project agent.", wait_for_response=False)
# 2. Spawn with voice instructions (mechanism depends on your setup)
spawn_agent(path="/path", prompt="Load voicemode skill, use converse to greet user")
# 3. Go quiet - let new agent take over
Hand-back:
voicemode:converse("Transferring you back to the assistant.", wait_for_response=False)
# Stop conversing, exit or go idle
Key Principles
- Announce transfers: Always tell the user before transferring
- One speaker: Only one agent should use converse at a time
- Distinct voices: Different voices make handoffs audible
- Provide context: Tell receiving agent why user is being transferred
Detailed Documentation
See Call Routing for comprehensive guides:
- Handoff Pattern – Complete hand-off and hand-back process
- Voice Proxy – Relay pattern for agents without voice
- Call Routing Overview – All routing patterns
Sharing Voice Services Over Tailscale
Expose local Whisper (STT) and Kokoro (TTS) to other devices on your Tailnet via HTTPS.
Why
- Browsers require HTTPS for microphone access (e.g., VoiceMode Connect web app)
- Tailscale serve provides automatic HTTPS with valid Let’s Encrypt certificates for
*.ts.netdomains - Enables using your powerful local machine’s GPU from any device on your Tailnet
Setup
# Expose TTS (Kokoro on port 8880)
tailscale serve --bg --set-path /v1/audio/speech http://localhost:8880/v1/audio/speech
# Expose STT (Whisper on port 2022)
tailscale serve --bg --set-path /v1/audio/transcriptions http://localhost:2022/v1/audio/transcriptions
# Verify configuration
tailscale serve status
# Reset all serve config
tailscale serve reset
Endpoints
After setup, endpoints are available at:
- TTS:
https://<hostname>.<tailnet>.ts.net/v1/audio/speech - STT:
https://<hostname>.<tailnet>.ts.net/v1/audio/transcriptions
Important Notes
- Path mapping: Tailscale strips the incoming path before forwarding, so you MUST include the full path in the target URL
- Same-machine testing: Traffic doesn’t route through Tailscale locally â test from another Tailnet device
- Multiple paths: You can configure different paths to different backends on the same or different machines
- CORS: Kokoro has CORS configured to allow
https://app.voicemode.devorigins
Use with VoiceMode Connect
In the VoiceMode Connect web app settings (app.voicemode.dev/settings), set:
- TTS Endpoint:
https://<hostname>.<tailnet>.ts.net - STT Endpoint:
https://<hostname>.<tailnet>.ts.net
Documentation Index
| Topic | Link |
|---|---|
| Converse Parameters | All Parameters |
| Installation | Getting Started |
| Configuration | Configuration Guide |
| Claude Code Plugin | Plugin Guide |
| Whisper STT | Whisper Setup |
| Kokoro TTS | Kokoro Setup |
| Pronunciation | Pronunciation Guide |
| Troubleshooting | Troubleshooting |
| CLI Reference | CLI Docs |
| DJ Mode | Background Music |
Related Skills
- VoiceMode Connect – Remote voice via mobile/web clients (no local STT/TTS needed)