qwen-voice
25
总安装量
25
周安装量
#7969
全站排名
安装命令
npx skills add https://github.com/ada20204/qwen-voice --skill qwen-voice
Agent 安装分布
openclaw
14
gemini-cli
12
opencode
12
claude-code
11
codex
11
antigravity
9
Skill 文档
Qwen Voice (ASR + TTS)
Use the bundled scripts. Configure DASHSCOPE_API_KEY in one of:
~/.config/qwen-voice/.env(recommended)<repo>/.qwen-voice/.env(dev/testing)
ASR (speech â text)
Non-timestamp (default)
python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg
With timestamps (chunk-based)
python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3
Notes:
- Timestamps are generated by fixed-length chunking (not word-level alignment).
- Input audio is converted to mono 16kHz WAV before sending.
TTS (text â speech)
Preset voice (default: Cherry)
python3 skills/qwen-voice/scripts/qwen_tts.py --text 'ä½ å¥½ï¼ææ¯ Piã' --voice Cherry --out /tmp/out.ogg
Clone voice (create once, reuse)
- Create a voice profile from a sample audio:
python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json
- Use the cloned voice to synthesize:
python3 skills/qwen-voice/scripts/qwen_tts.py --text 'ä½ å¥½ï¼ææ¯ Georgeã' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg
Notes:
.oggoutput is Opus, suitable for Telegram voice messages.- Voice cloning uses DashScope customization endpoint + Qwen realtime TTS model.
- Scripts use a local venv at
work/venv-dashscope(auto-created on first run).
Typical chat workflow
- When user sends voice message/audio: run ASR and reply with the transcribed text.
- When user explicitly asks for voice reply: run TTS and send the generated
.oggas a voice note.