paratran transcription

📁 briansunter/paratran 📅 Jan 1, 1970

总安装量

周安装量

#72572

全站排名

安装命令

npx skills add https://github.com/briansunter/paratran --skill Paratran Transcription

Skill 文档

Paratran Transcription

Audio transcription for Apple Silicon using parakeet-mlx. #1 on Open ASR Leaderboard, ~30x faster than Whisper via MLX.

Three interfaces: CLI, REST API, and MCP server.

Setup

Quick run (no install)

uvx paratran recording.wav

Persistent install

uv tool install paratran

From source

git clone https://github.com/briansunter/paratran.git
cd paratran
uv sync
uv run paratran recording.wav

CLI Transcription

# Transcribe to text (default)
paratran recording.wav

# Multiple files with verbose output
paratran -v file1.wav file2.mp3 file3.m4a

# Output as SRT subtitles
paratran --output-format srt recording.wav

# All formats (txt, json, srt, vtt) to a directory
paratran --output-format all --output-dir ./output recording.wav

# Beam search decoding
paratran --decoding beam recording.wav

# Custom model and cache directory
paratran --model mlx-community/parakeet-tdt-1.1b-v2 --cache-dir /path/to/models recording.wav

CLI Options

Flag	Default	Description
`--model`	`mlx-community/parakeet-tdt-0.6b-v3`	HF model ID or local path
`--cache-dir`	HuggingFace default	Model cache directory
`--output-dir`	`.`	Output directory
`--output-format`	`txt`	`txt`, `json`, `srt`, `vtt`, or `all`
`--decoding`	`greedy`	`greedy` or `beam`
`--chunk-duration`	`120`	Chunk duration in seconds (0 to disable)
`--overlap-duration`	`15`	Overlap between chunks
`--beam-size`	`5`	Beam size (beam decoding)
`--fp32`		Use FP32 precision instead of BF16
`-v`		Verbose output

Environment variables: PARATRAN_MODEL, PARATRAN_MODEL_DIR.

REST API Server

# Start server
paratran serve

# Custom host, port, and model cache
paratran serve --host 127.0.0.1 --port 9000 --cache-dir /path/to/models

Endpoints

GET /health â Returns model name, status, and cache directory.

POST /transcribe â Upload audio file, returns transcription JSON.

# Basic transcription
curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"

# With beam search and sentence splitting
curl -X POST "http://localhost:8000/transcribe?decoding=beam&max_words=20" -F "file=@recording.m4a"

# Extract just text
curl -s -X POST http://localhost:8000/transcribe -F "file=@audio.m4a" | jq -r '.text'

Query parameters: decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32.

Response format

{
  "text": "Full transcription text.",
  "duration": 3.52,
  "processing_time": 0.176,
  "sentences": [
    {
      "text": "Full transcription text.",
      "start": 0.0,
      "end": 3.52,
      "tokens": [
        { "text": "Full", "start": 0.0, "end": 0.24 },
        { "text": " transcription", "start": 0.24, "end": 0.8 }
      ]
    }
  ]
}

Interactive API docs at http://localhost:8000/docs.

MCP Server

Paratran includes an MCP server so Claude Code, Claude Desktop, or any MCP client can transcribe audio files directly.

Claude Code

Add to .claude/settings.json:

{
  "mcpServers": {
    "paratran": {
      "command": "uvx",
      "args": ["--from", "paratran", "paratran-mcp"]
    }
  }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "paratran": {
      "command": "uvx",
      "args": ["--from", "paratran", "paratran-mcp"]
    }
  }
}

Optionally set PARATRAN_MODEL_DIR in the env block to customize the model cache location.

MCP Tool

The transcribe tool accepts:

file_path (required) â absolute path to audio file
All transcription options: decoding, beam_size, length_penalty, patience, duration_reward, max_words, silence_gap, max_duration, chunk_duration, overlap_duration, fp32

Returns JSON string with full text, duration, processing time, and sentences with word-level timestamps.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台