transcription expert
4
总安装量
0
周安装量
#48623
全站排名
安装命令
npx skills add https://github.com/willsigmon/sigstack --skill Transcription Expert
Skill 文档
Transcription Expert
Choose the right transcription service for your use case.
Pricing Comparison (2026)
| Service | Price/min | Speed | Diarization | Real-time |
|---|---|---|---|---|
| Whisper API | $0.006 | Slow | No (+extra) | No |
| Deepgram | $0.0043 | 20s/hr | Yes | Yes |
| AssemblyAI | $0.0025 | Fast | +$0.02/hr | Yes |
When to Use Each
Whisper
- One-time batch processing
- Self-hosting option (free)
- Privacy-sensitive (local)
- Best: Podcasts, offline processing
Deepgram
- Real-time applications
- Live captioning
- Speaker identification built-in
- Best: Meetings, call centers, voice apps
AssemblyAI
- Cheapest per-minute
- AI features (sentiment, topics)
- PII redaction
- Best: Content analysis, compliance
Quick Implementations
Whisper (OpenAI)
from openai import OpenAI
client = OpenAI()
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="whisper-1", file=f
)
print(transcript.text)
Deepgram
from deepgram import DeepgramClient, PrerecordedOptions
dg = DeepgramClient(api_key="...")
options = PrerecordedOptions(model="nova-3", diarize=True)
response = dg.listen.rest.v1.transcribe_file(
{"buffer": open("audio.mp3", "rb")}, options
)
AssemblyAI
import assemblyai as aai
aai.settings.api_key = "..."
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("audio.mp3")
print(transcript.text)
Speaker Diarization
Deepgram (Built-in)
options = PrerecordedOptions(diarize=True)
# Response includes speaker labels automatically
AssemblyAI
config = aai.TranscriptionConfig(speaker_labels=True)
# +$0.02/hr additional
Whisper (Requires Extra)
# Need separate diarization service like pyannote
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
Batch Processing
import asyncio
async def transcribe_batch(files):
tasks = [transcribe(f) for f in files]
return await asyncio.gather(*tasks)
Output Formats
- Plain text
- SRT/VTT subtitles
- JSON with timestamps
- Word-level timing
Use when: Podcast transcription, meeting notes, video subtitles, voice content indexing