convert-doc
10
总安装量
7
周安装量
#30811
全站排名
安装命令
npx skills add https://github.com/carlheath/ogmios --skill convert-doc
Agent 安装分布
claude-code
5
gemini-cli
5
antigravity
5
codex
4
opencode
4
Skill 文档
Smart Document Pipeline
Quick Reference
# Convert document (auto-caches, auto-summarizes if >100KB)
python ~/.claude/lib/document-converter.py "/path/to/file.pdf"
# Force regenerate
python ~/.claude/lib/document-converter.py "/path/to/file.pdf" --force
# List cached documents
python ~/.claude/lib/document-converter.py --list
# Cleanup old cache (>1 week)
python ~/.claude/lib/document-converter.py --cleanup
Supported Formats
| Format | Extension | Tool | Notes |
|---|---|---|---|
| PyMuPDF | Text extraction, page-by-page | ||
| Word | .docx, .doc | pandoc/python-docx | Full markdown |
| PowerPoint | .pptx, .ppt | python-pptx | Slide-by-slide with notes |
| Excel | .xlsx, .xls | openpyxl | Tables as markdown |
| RTF | .rtf | pandoc | Rich text |
Output Structure
{
"cache_path": "/path/to/cached/file.md",
"summary_path": "/path/to/cached/file_summary.md", // if >100KB
"from_cache": false,
"original_size": 26744198,
"converted_size": 129844,
"summary_size": 30638,
"savings_percent": 99.5,
"recommendation": "summary" // "summary" or "full"
}
Auto-Summary
Documents >100KB automatically get a summary version:
| Version | Purpose | Size Target |
|---|---|---|
| Full | Complete content | As converted |
| Summary | Quick overview | ~30KB |
The summary preserves:
- All headers and structure
- First portion of each section
- Metadata and source reference
Automatic Integration
The smart-read-interceptor hook automatically triggers when you read:
- PDF, Word, PowerPoint, Excel files
- Any file >200KB
It will suggest:
- Use summary – If summary exists (best for overview)
- Use cache – If full cached version exists
- Convert first – If no cache exists
- Delegate – For very large files, use subagent
Subagent Delegation Pattern
For very large documents, delegate to isolated context:
Task(
subagent_type="Explore",
prompt="Read and summarize key points from: /path/to/large-file.pdf.
Focus on: [specific topics]. Max 500 words summary."
)
This keeps the large content OUT of main context.
Cache Location
~/.claude/cache/documents/
âââ filename_hash.md # Full converted version
âââ filename_hash_summary.md # Summary (if >100KB)
âââ ...
Cache expires after 1 week. Run --cleanup to remove old files.
Real-World Results
| Document | Original | Converted | Summary | Savings |
|---|---|---|---|---|
| Google AI Guide (PDF) | 26.7 MB | 127 KB | 30 KB | 99.9% |
| Debatt (Word) | 206 KB | 5.4 KB | – | 97% |
| Ãvning (PowerPoint) | 7.2 MB | 3.1 KB | – | 99.96% |
Workflow Examples
Reading a PDF for research
1. User asks to analyze a PDF
2. Hook detects: "ð DOCUMENT FILE: .PDF"
3. Convert: python ~/.claude/lib/document-converter.py "file.pdf"
4. Read the summary for overview
5. Read specific sections from full version if needed
Processing multiple documents
1. Convert all documents first (batch):
for f in *.pdf; do python ~/.claude/lib/document-converter.py "$f"; done
2. Read summaries in main context
3. Delegate deep analysis to subagents