paratran transcription
npx skills add https://github.com/briansunter/paratran --skill Paratran Transcription
Skill 文档
Paratran Transcription
Audio transcription for Apple Silicon using parakeet-mlx. #1 on Open ASR Leaderboard, ~30x faster than Whisper via MLX.
Three interfaces: CLI, REST API, and MCP server.
Setup
Quick run (no install)
uvx paratran recording.wav
Persistent install
uv tool install paratran
From source
git clone https://github.com/briansunter/paratran.git
cd paratran
uv sync
uv run paratran recording.wav
CLI Transcription
# Transcribe to text (default)
paratran recording.wav
# Multiple files with verbose output
paratran -v file1.wav file2.mp3 file3.m4a
# Output as SRT subtitles
paratran --output-format srt recording.wav
# All formats (txt, json, srt, vtt) to a directory
paratran --output-format all --output-dir ./output recording.wav
# Beam search decoding
paratran --decoding beam recording.wav
# Custom model and cache directory
paratran --model mlx-community/parakeet-tdt-1.1b-v2 --cache-dir /path/to/models recording.wav
CLI Options
| Flag | Default | Description |
|---|---|---|
--model |
mlx-community/parakeet-tdt-0.6b-v3 |
HF model ID or local path |
--cache-dir |
HuggingFace default | Model cache directory |
--output-dir |
. |
Output directory |
--output-format |
txt |
txt, json, srt, vtt, or all |
--decoding |
greedy |
greedy or beam |
--chunk-duration |
120 |
Chunk duration in seconds (0 to disable) |
--overlap-duration |
15 |
Overlap between chunks |
--beam-size |
5 |
Beam size (beam decoding) |
--fp32 |
Use FP32 precision instead of BF16 | |
-v |
Verbose output |
Environment variables: PARATRAN_MODEL, PARATRAN_MODEL_DIR.
REST API Server
# Start server
paratran serve
# Custom host, port, and model cache
paratran serve --host 127.0.0.1 --port 9000 --cache-dir /path/to/models
Endpoints
GET /health â Returns model name, status, and cache directory.
POST /transcribe â Upload audio file, returns transcription JSON.
# Basic transcription
curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"
# With beam search and sentence splitting
curl -X POST "http://localhost:8000/transcribe?decoding=beam&max_words=20" -F "file=@recording.m4a"
# Extract just text
curl -s -X POST http://localhost:8000/transcribe -F "file=@audio.m4a" | jq -r '.text'
Query parameters: decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32.
Response format
{
"text": "Full transcription text.",
"duration": 3.52,
"processing_time": 0.176,
"sentences": [
{
"text": "Full transcription text.",
"start": 0.0,
"end": 3.52,
"tokens": [
{ "text": "Full", "start": 0.0, "end": 0.24 },
{ "text": " transcription", "start": 0.24, "end": 0.8 }
]
}
]
}
Interactive API docs at http://localhost:8000/docs.
MCP Server
Paratran includes an MCP server so Claude Code, Claude Desktop, or any MCP client can transcribe audio files directly.
Claude Code
Add to .claude/settings.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"paratran": {
"command": "uvx",
"args": ["--from", "paratran", "paratran-mcp"]
}
}
}
Optionally set PARATRAN_MODEL_DIR in the env block to customize the model cache location.
MCP Tool
The transcribe tool accepts:
file_path(required) â absolute path to audio file- All transcription options:
decoding,beam_size,length_penalty,patience,duration_reward,max_words,silence_gap,max_duration,chunk_duration,overlap_duration,fp32
Returns JSON string with full text, duration, processing time, and sentences with word-level timestamps.