audio to srt converter
1
总安装量
0
周安装量
#48701
全站排名
安装命令
npx skills add https://github.com/dean9703111/ai-agent-skill-for-video-workflow --skill Audio to SRT Converter
Skill 文档
Audio to SRT Converter
This skill provides a Python-based workflow for converting audio files (MP3, WAV, M4A, FLAC, etc.) into SRT subtitle files with automatic speech recognition, customizable text formatting, and timeline optimization.
Purpose
Convert audio files (MP3, WAV, M4A, FLAC, etc.) into properly formatted SRT subtitle files with:
- Automatic speech recognition and transcription
- Support for multiple audio formats (MP3, WAV, M4A, FLAC, and more)
- Customizable character limits per subtitle line (default: 22 characters, minimum: 4 characters)
- Automatic timeline gap filling (gaps < 0.3s are merged)
- Environment and dependency validation
- Output naming convention:
origin.srt
When to Use This Skill
Use this skill when:
- Converting audio files to subtitle format
- Generating transcriptions with timeline information
- Creating SRT files for video editing or accessibility
- Processing Chinese or multilingual audio content
Core Workflow
1. Environment Validation
Before processing, validate:
- Python 3.7+ is installed
- Required packages are available (see Dependencies section)
- Input MP3 file exists and is readable
- Output directory is writable
2. Audio Transcription
Process the audio file using speech recognition:
- Load audio file (supports MP3, WAV, M4A, FLAC, etc.)
- Perform speech-to-text conversion
- Extract timestamps for each segment
- Handle silence detection and word boundaries
3. Text Formatting
Format transcribed text according to parameters:
- Split text into lines based on character limit
- Ensure minimum 4 characters per line
- Respect word boundaries when possible
- Handle Chinese character counting correctly
4. Timeline Optimization
Adjust subtitle timing:
- Identify gaps between subtitle segments
- Merge segments when gap < 0.3 seconds
- Extend previous subtitle end time to next subtitle start time
- Maintain synchronization with audio
5. SRT Generation
Create final SRT file:
- Format according to SRT specification
- Number subtitles sequentially
- Use proper timestamp format (HH:MM:SS,mmm)
- Save as
origin.srt
Using the Conversion Script
The main conversion script is located at scripts/audio_to_srt.py.
Basic Usage
python scripts/audio_to_srt.py <audio_file> [--max-chars MAX_CHARS]
Parameters
audio_file(required): Path to the input audio file (MP3, WAV, M4A, FLAC, etc.)--max-chars(optional): Maximum characters per subtitle line (default: 22, minimum: 4)
Examples
See examples/usage_example.sh for complete usage examples.
Dependencies
The script requires the following Python packages:
openai-whisper– For speech recognitionpydub– For audio processingffmpeg– System dependency for audio handling
Install with:
pip install openai-whisper pydub
brew install ffmpeg # macOS
Output Format
The generated SRT file follows this format:
1
00:00:00,000 --> 00:00:03,500
éæ¯ç¬¬ä¸è¡åå¹
2
00:00:03,500 --> 00:00:07,200
éæ¯ç¬¬äºè¡åå¹
Additional Resources
Scripts
scripts/audio_to_srt.py– Main conversion script with environment validationscripts/check_environment.py– Standalone environment checker
Examples
examples/usage_example.sh– Complete usage examples with different parameters