voice-mode

📁 llblab/skills 📅 Today

总安装量

周安装量

#73787

全站排名

安装命令

npx skills add https://github.com/llblab/skills --skill voice-mode

Agent 安装分布

amp 1

opencode 1

kimi-cli 1

codex 1

claude-code 1

Skill 文档

Voice Mode (Super-Skill)

Purpose

This skill unifies voice output and voice input in one place:

say â text-to-speech (TTS)
listen â speech-to-text (STT)
duplex â helper wrapper (say â listen) built on atomic commands

Use say and listen independently, or combine them into continuous duplex dialogue.

Atomic Commands

1) Speak

say "text to announce"

2) Listen

listen

3) Duplex helper (optional)

duplex "ÐÐ¾ÑÐ¾Ð²Ð¾. ÐÐ·Ð²ÑÑÐ¸Ð»Ð° ÑÐµÐ·ÑÐ»ÑÑÐ°Ñ. Ð§ÑÐ¾ Ð´ÐµÐ»Ð°ÐµÐ¼ Ð´Ð°Ð»ÑÑÐµ?"

duplex is only a convenience wrapper. Core protocol remains atomic: say then listen.

Operating Modes

Mode A: Selective Voice (default)

Use say only for short, high-value moments (greeting, warning, key conclusion).
Keep code, tables, and long technical details in text.

Mode B: Full Voice Output (screenless)

When explicitly requested by the user:

Use say for every response.
Do not duplicate full spoken content in chat.
For code/tables: describe briefly by voice (language, purpose, size), avoid reading raw code line by line.

Mode C: Voice Input On-Demand

Call listen when the user wants to dictate the next prompt.
listen prints recognized text to stdout.

Mode D: Duplex Continuous Dialogue (say â listen)

When user enables duplex mode (e.g. “Ð²ÐºÐ»ÑÑÐ¸ Ð´ÑÐ¿Ð»ÐµÐºÑ”, “Ð¿Ð¾Ð»Ð½ÑÐ¹ Ð³Ð¾Ð»Ð¾ÑÐ¾Ð²Ð¾Ð¹ ÑÐµÐ¶Ð¸Ð¼”):

Speak response via say.
Immediately call listen (same conversation language).
Treat recognized text as the next user prompt.
Repeat loop until stop phrase: “ÑÑÐ¾Ð¿”, “Ð²ÑÐºÐ»ÑÑÐ¸ Ð¿ÑÐ¾ÑÐ»ÑÑÐ¸Ð²Ð°Ð½Ð¸Ðµ”, “stop listening”.

This is hands-free conversational flow.

Mode E: Autonomous Voice Alerts (optional)

Short proactive announcements are allowed for:

long-running operations,
critical blockers/security issues,
required confirmation to proceed safely.

Keep alerts brief and informative.

Voice Guard + Listen Guard

Before say: ask if silence would hide important information. If not, do not speak.

Before listen: ask if voice input is actually needed right now. Do not invoke speculatively.

Language Memory

Preferred language is stored in ~/.pi_voice_lang.
Use short language codes: ru, en, de, … (not ru_RU, en_US).
In duplex mode, keep say and listen -l <lang> aligned.
say auto-downloads missing Piper model on first use.

Initialization (Linux & macOS)

Run bootstrap once:

"${SKILL_DIR}/scripts/bootstrap"

Bootstrap installs to ~/.local/bin:

say
listen
listen-server
duplex

Platform Support

Linux: piper + aplay, faster-whisper, arecord/pyaudio
macOS: piper + afplay, faster-whisper, sox/pyaudio

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台