speechall-cli
1
总安装量
1
周安装量
#45070
全站排名
安装命令
npx skills add https://github.com/speechall/speechall-cli --skill speechall-cli
Agent 安装分布
amp
1
opencode
1
kimi-cli
1
codex
1
github-copilot
1
claude-code
1
Skill 文档
speechall-cli
CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).
Installation
Homebrew (macOS and Linux)
brew install Speechall/tap/speechall
Without Homebrew: See references/manual-install.md for manual download instructions.
Verify
speechall --version
Authentication
An API key is required. Provide it via environment variable (preferred) or flag:
export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav
Commands
transcribe (default)
Transcribe an audio or video file. This is the default subcommand â speechall audio.wav is equivalent to speechall transcribe audio.wav.
speechall <file> [options]
Options:
| Flag | Description | Default |
|---|---|---|
--model <provider.model> |
STT model identifier | openai.gpt-4o-mini-transcribe |
--language <code> |
Language code (e.g. en, tr, de) |
API default (auto-detect) |
--output-format <format> |
Output format (text, json, verbose_json, srt, vtt) |
API default |
--diarization |
Enable speaker diarization | off |
--speakers-expected <n> |
Expected number of speakers (use with --diarization) |
â |
--no-punctuation |
Disable automatic punctuation | â |
--temperature <0.0-1.0> |
Model temperature | â |
--initial-prompt <text> |
Text prompt to guide model style | â |
--custom-vocabulary <term> |
Terms to boost recognition (repeatable) | â |
--ruleset-id <uuid> |
Replacement ruleset UUID | â |
--api-key <key> |
API key (overrides SPEECHALL_API_KEY env var) |
â |
Examples:
# Basic transcription
speechall interview.mp3
# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en
# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt
# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"
# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4
models
List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.
speechall models [options]
Filter flags:
| Flag | Description |
|---|---|
--provider <name> |
Filter by provider (e.g. openai, deepgram) |
--language <code> |
Filter by supported language (tr matches tr, tr-TR, tr-CY) |
--diarization |
Only models supporting speaker diarization |
--srt |
Only models supporting SRT output |
--vtt |
Only models supporting VTT output |
--punctuation |
Only models supporting automatic punctuation |
--streamable |
Only models supporting real-time streaming |
--vocabulary |
Only models supporting custom vocabulary |
Examples:
# List all available models
speechall models
# Models from a specific provider
speechall models --provider deepgram
# Models that support Turkish and diarization
speechall models --language tr --diarization
# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'
Tips
- On macOS, video files (
.mp4,.mov, etc.) are automatically converted to audio before upload. - On Linux, pass audio files directly (
.wav,.mp3,.m4a,.flac, etc.). - Output goes to stdout. Redirect to save:
speechall audio.wav > transcript.txt - Errors go to stderr, so piping stdout is safe.
- Run
speechall --help,speechall transcribe --help, orspeechall models --helpto see all valid enum values for model identifiers, language codes, and output formats.