chichi-speech
1
总安装量
1
周安装量
#78626
全站排名
安装命令
npx skills add https://github.com/hudeven/chichi-speech --skill chichi-speech
Agent 安装分布
amp
1
cline
1
openclaw
1
opencode
1
cursor
1
kimi-cli
1
Skill 文档
Chichi Speech Service
This skill provides a FastAPI-based REST service for Qwen3 TTS, specifically configured for reusing a high-quality reference audio prompt for efficient and consistent voice cloning. This service is packaged as an installable CLI.
Installation
Prerequisites: uv and python >= 3.10.
If uv not installed, install it first
command -v uv || brew install uv || (curl -LsSf https://astral.sh/uv/install.sh | sh)
Install chichi-speech
export CHICHI_SPEECH_VENV="/Users/$USER/envs/chichi-speech"
uv venv $CHICHI_SPEECH_VENV --allow-existing
source $CHICHI_SPEECH_VENV/bin/activate
uv pip install -e .
Usage
1. Start the Service
The service runs on port 9090 by default.
# Production: Use the included script to run as a daemon with auto-restart
./scripts/start_server.sh --ref-audio src/assets/coco.wav --ref-text src/assets/coco.txt
# To stop:
# ./scripts/stop_server.sh
# Logs are written to /tmp/chichi_server.log
2. Verify Service is Running
Wait for ~15s for model to be loaded. Then Check the health/docs:
sleep 15 && curl http://localhost:9090/docs
3. Generate Speech
If the “text” is longer than 10 sentences, split it to smaller chunks(10 sentences per chunk) and synthesized separately. Specify the “language” based on the input “text”. Available languages:
- Chinese
- English
- Japanese
- Korean
- German
- French
- Russian
- Portuguese
- Spanish
- Italian
curl -X POST "http://localhost:9090/synthesize" \
-H "Content-Type: application/json" \
-d '{
"text": "Nice to meet you",
"language": "auto",
"format": "ogg"
}' \
--output output/nice_to_meet.ogg
Functionality
- Endpoint:
POST /synthesize - Default Port: 9090
- Voice Cloning: Uses a pre-computed voice prompt from reference files to ensure the cloned voice is consistent and generation is fast.
Requirements
- Python 3.10+
qwen-tts(Qwen3 model library)- Access to a reference audio file for voice cloning.
- By default, it uses public sample audio from Qwen3.
- CRITICAL: You can provide your own reference audio using the
--ref-audioand--ref-textflags.