qwen3-tts-skills
npx skills add https://github.com/mu-zi-lee/qwen3-tts-skill --skill qwen3-tts-skills
Agent 安装分布
Skill 文档
Qwen3-TTS æè½
å°ææ¬è½¬æ¢ä¸ºé«è´¨éè¯é³ç宿´å·¥ä½æµã
ð å¿«éå¼å§
åºæ¯ 1ï¼åå¥è¯é³çæ
ç´æ¥è°ç¨èæ¬çæè¯é³ï¼
# 䏿è¯é³ï¼é»è®¤ Vivian 女声ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py custom-voice \
--language Chinese \
--text "ä½ å¥½ï¼æ¬¢è¿ä½¿ç¨è¯é³åæã" \
--out-dir outputs
# è±æè¯é³ï¼é»è®¤ Ryan ç·å£°ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py custom-voice \
--language English \
--text "Hello, welcome to text-to-speech." \
--out-dir outputs
åºæ¯ 2ï¼é¿æç¨¿æ¹éé é³
å°æç« è½¬æ¢ä¸ºå®æ´è¯é³æä»¶ï¼
ç¨æ·æç¨¿ â [AIåæçæé
é³ç¨¿] â [ç¨æ·å®¡æ ¸] â [æ¹éTTS] â 宿´è¯é³.wav
详è§ä¸æ¹ é¿æç¨¿æ¹éé é³ ç« èã
ð æ¨¡åéæ©æå
æ ¹æ®éæ±éæ©åéçæ¨¡åï¼
| æ¨¡å¼ | 模å | éç¨åºæ¯ | å½ä»¤ |
|---|---|---|---|
| CustomVoice | Qwen3-TTS-12Hz-1.7B-CustomVoice |
使ç¨å ç½®é³è² + æ ææ§å¶ | custom-voice |
| VoiceDesign | Qwen3-TTS-12Hz-1.7B-VoiceDesign |
ç¨èªç¶è¯è¨æè¿°æ³è¦çé³è² | voice-design |
| VoiceClone | Qwen3-TTS-12Hz-1.7B-Base |
å éåèé³é¢çå£°é³ | voice-clone |
| Tokenizer | Qwen3-TTS-Tokenizer-12Hz |
é³é¢ç¼è§£ç | tokenizer-roundtrip |
å ç½® Speakerï¼CustomVoice 模å¼ï¼
| è¯è¨ | é»è®¤ Speaker | 说æ |
|---|---|---|
| Chinese | Vivian | 女声ï¼èªç¶ |
| English | Ryan | ç·å£° |
| Japanese | Ono_Anna | 女声 |
| Korean | Sohee | 女声 |
ðï¸ åå¥è¯é³çæ
CustomVoiceï¼æ¨èå ¥é¨ï¼
使ç¨å ç½®é³è²ï¼å¯éæ ææ§å¶ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py custom-voice \
--language Chinese \
--text "å
¶å®æççæåç°ï¼ææ¯ä¸ä¸ªç¹å«åäºè§å¯å«äººæ
绪ç人ã" \
--speaker Vivian \
--instruct "è½»æ¾æå¿«çè¯æ°" \
--out-dir outputs
åæ°è¯´æï¼
--languageï¼Chinese / English / Japanese / Korean--speakerï¼å¯éï¼ä¸å¡«åæè¯è¨èªå¨éé»è®¤--instructï¼å¯éï¼æ æ/è¯æ°æ§å¶ï¼å¦”å¼å¿å°è¯´”ã”使²ç¼æ ¢”ï¼--outputï¼å¯éï¼æå®è¾åºæä»¶åï¼é»è®¤èªå¨çææ¶é´æ³æä»¶åï¼
VoiceDesignï¼è®¾è®¡ç¬ç¹é³è²ï¼
ç¨èªç¶è¯è¨æè¿°æ³è¦çé³è²ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py voice-design \
--language Chinese \
--text "å¥å¥ï¼ä½ 忥å¦ï¼äººå®¶çäºä½ 好ä¹
好ä¹
äºï¼è¦æ±æ±ï¼" \
--instruct "ä½ç°æå¨ç¨å«©çèè女声ï¼é³è°åé«ä¸èµ·ä¼ææ¾ã" \
--out-dir outputs
注æï¼VoiceDesign ç --instruct æ¯å¿
å¡«çï¼ç¨äºæè¿°é³è²ç¹å¾ã
VoiceCloneï¼è¯é³å éï¼
å éåèé³é¢ç声é³ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py voice-clone \
--language English \
--ref-audio "path/to/reference.wav" \
--ref-text "åèé³é¢çææ¬å
容" \
--text "è¦åæçæ°ææ¬" \
--out-dir outputs
åæ°è¯´æï¼
--ref-audioï¼åèé³é¢æä»¶è·¯å¾æ URL--ref-textï¼åèé³é¢å¯¹åºçææ¬ï¼å¿ å¡«ï¼--x-vector-only-modeï¼å¯éï¼ä» 使ç¨è¯´è¯äººç¹å¾ï¼è´¨éå¯è½éä½ï¼
â ï¸ æ³¨æï¼VoiceClone 䏿¯æ --instruct æ
ææ§å¶ã
Tokenizerï¼é³é¢ç¼è§£ç ï¼
ç¨äºé³é¢çç¼ç åè§£ç éªè¯ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py tokenizer-roundtrip \
--audio "path/to/audio.wav" \
--out-dir outputs
ð¬ é¿æç¨¿æ¹éé é³çæ
å°é¿æç« ãå§æ¬ãæå£°ä¹¦å 容转æ¢ä¸ºå®æ´è¯é³æä»¶ã
工使µç¨
âââââââââââââââââââ âââââââââââââââââââ âââââââââââââââââââ âââââââââââââââââââ
â Step 1 â â Step 2 â â Step 3 â â è¾åº â
â AIåææç¨¿ â âââ â ç¨æ·å®¡æ ¸ä¿®æ¹ â âââ â æ¹éçæè¯é³ â âââ â 宿´è¯é³.wav â
â çæé
é³ç¨¿JSON â â ä¿å.jsonæä»¶ â â FFmpegåå¹¶ â â â
âââââââââââââââââââ âââââââââââââââââââ âââââââââââââââââââ âââââââââââââââââââ
Step 1ï¼è®© AI çæé é³ç¨¿
å AI 说3æä¸é¢è¿ç¯æç« 转æè¯é³” + è´´ä¸æç« å 容
AI 伿 dubbing-skills/SKILL.md çè§åï¼
- æºè½ååï¼æ¯æ®µ 200-300 åï¼
- è¯å«è§è²ï¼
ãæç½ãããå°æãçï¼ - åææ
æï¼çæ
instruct - è¾åºé é³ç¨¿ JSON
Step 2ï¼ç¨æ·å®¡æ ¸ä¿®æ¹
æ£æ¥ JSON å¹¶è°æ´ï¼
- å忝å¦åç
- è§è²åé æ¯å¦æ£ç¡®
- æ
æ
instructæ¯å¦åé - TTS æ¨¡å¼æ¯å¦éè¦è°æ´
ä¿å为 article.dubbing.json æä»¶ã
Step 3ï¼æ¹éçæè¯é³
uv run qwen3-tts-skills/scripts/batch_dubbing.py \
--input article.dubbing.json \
--out-dir outputs
åæ°è¯´æï¼
| åæ° | 说æ | é»è®¤å¼ |
|---|---|---|
--input |
é é³ç¨¿ JSON æä»¶ | å¿ å¡« |
--out-dir |
è¾åºç®å½ | outputs |
--silence-gap |
æ®é段è½é´éé³ï¼ç§ï¼ | 0.3 |
--character-switch-gap |
è§è²åæ¢æ¶éé³ï¼ç§ï¼ | 0.5 |
--clean-segments |
åå¹¶åå é¤ä¸é´ç段 | ä¿ç |
è¾åºç»æ
outputs/
âââ segments/
â âââ seg_001_æç½.wav
â âââ seg_002_å°æ.wav
â âââ ...
âââ article.dubbing.json # é
é³ç¨¿å¤ä»½
âââ article_final.wav # æç»å®æ´è¯é³
æ¯æçä¸ç§æ¨¡å¼
| æ¨¡å¼ | 说æ | éç¨åºæ¯ |
|---|---|---|
custom-voice |
å ç½®é³è² + æ ææä»¤ | 大夿°åºæ¯ï¼é»è®¤ï¼ |
voice-design |
èªç¶è¯è¨æè¿°é³è² | éè¦ç¹å®é³è²ï¼èèã大åçï¼ |
voice-clone |
å éåèé³é¢ | éè¦ç人/ç¹å®äººå£°é³ |
ð§ ç¯å¢é ç½®
æ¨èæ¹å¼ï¼ç´æ¥ç¨ uv run
èæ¬å 已声æä¾èµï¼æ éæå¨å®è£ ï¼
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py -h
å建åºå®èæç¯å¢
uv venv --python 3.12
.\.venv\Scripts\activate
uv pip install -U qwen-tts
å®è£ FlashAttention 2ï¼å¯éï¼é使¾åï¼
uv pip install -U flash-attn --no-build-isolation
# å
å < 96GB æ¶éå¶å¹¶è¡ä»»å¡
MAX_JOBS=4 uv pip install -U flash-attn --no-build-isolation
ä½¿ç¨æ¡ä»¶ï¼
- ç¡¬ä»¶å ¼å®¹ FlashAttention 2
- 模å以
torch.float16ætorch.bfloat16å è½½
å®è£ FFmpegï¼æ¹éé é³å¿ éï¼
Windowsï¼
choco install ffmpeg -y
éªè¯å®è£ ï¼
ffmpeg -version
ð¥ æ¨¡å离线ä¸è½½
ä½¿ç¨ ModelScopeï¼ä¸å½å¤§éæ¨èï¼
uv pip install -U modelscope
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz
ä½¿ç¨ Hugging Face
uv pip install -U "huggingface_hub[cli]"
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./Qwen3-TTS-12Hz-1.7B-Base
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./Qwen3-TTS-Tokenizer-12Hz
ð¥ï¸ æ¬å° Web UI æ¼ç¤º
# æ¥ç帮å©
qwen-tts-demo --help
# å¯å¨ CustomVoice
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --ip 0.0.0.0 --port 8000
# å¯å¨ VoiceDesign
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --ip 0.0.0.0 --port 8000
HTTPS æ¯æï¼è§£å³éº¦å 飿éé®é¢ï¼
# çæèªç¾åè¯ä¹¦
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes -subj "/CN=localhost"
# å¯ç¨ HTTPS
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base \
--ip 0.0.0.0 --port 8000 \
--ssl-certfile cert.pem \
--ssl-keyfile key.pem \
--no-ssl-verify
ð åèææ¡£
| ææ¡£ | 说æ |
|---|---|
dubbing-skills/SKILL.md |
é é³ç¨¿çæè§èï¼AI é 读ç¨ï¼ |
dubbing-skills/references/dubbing_format.md |
é é³ç¨¿ JSON æ ¼å¼è¯¦ç»è§è |
dubbing-skills/references/examples.md |
åç§åºæ¯çé é³ç¨¿ç¤ºä¾ |
references/python_api.md |
Python API éææå |
â¡ æ§è½åæ°
uv run qwen3-tts-skills/scripts/run_qwen3_tts.py custom-voice \
--device-map cuda:0 \
--dtype bfloat16 \
--attn flash_attention_2 \
--language Chinese \
--text "æµè¯ææ¬" \
--out-dir outputs
| åæ° | 说æ |
|---|---|
--device-map |
æå® GPUï¼å¦ cuda:0ï¼æ CPU |
--dtype |
æ°æ®ç±»åï¼auto / bfloat16 / float16 / float32 |
--attn |
注æåå®ç°ï¼auto / flash_attention_2 |
â 常è§é®é¢
Windows è·¯å¾é®é¢
ç»å¯¹è·¯å¾éè¦ç¨åå¼å·å 裹ï¼
uv run "C:/Users/lee/.config/alma/skills/qwen3-tts-skills/scripts/run_qwen3_tts.py" -h
SoX è¦å
妿çå° SoX could not be found!ï¼å®è£
SoXï¼ä¸å½±ååè½ï¼åªæ¯æ¶é¤è¦åï¼ï¼
choco install sox.portable -y
模åä¸è½½æ ¢
ä¼å ä½¿ç¨ ModelScopeï¼ä¸å½å¤§éï¼ææåä¸è½½å°æ¬å°ç®å½ã