convert-doc

📁 carlheath/ogmios 📅 Jan 24, 2026
10
总安装量
7
周安装量
#30811
全站排名
安装命令
npx skills add https://github.com/carlheath/ogmios --skill convert-doc

Agent 安装分布

claude-code 5
gemini-cli 5
antigravity 5
codex 4
opencode 4

Skill 文档

Smart Document Pipeline

Quick Reference

# Convert document (auto-caches, auto-summarizes if >100KB)
python ~/.claude/lib/document-converter.py "/path/to/file.pdf"

# Force regenerate
python ~/.claude/lib/document-converter.py "/path/to/file.pdf" --force

# List cached documents
python ~/.claude/lib/document-converter.py --list

# Cleanup old cache (>1 week)
python ~/.claude/lib/document-converter.py --cleanup

Supported Formats

Format Extension Tool Notes
PDF .pdf PyMuPDF Text extraction, page-by-page
Word .docx, .doc pandoc/python-docx Full markdown
PowerPoint .pptx, .ppt python-pptx Slide-by-slide with notes
Excel .xlsx, .xls openpyxl Tables as markdown
RTF .rtf pandoc Rich text

Output Structure

{
  "cache_path": "/path/to/cached/file.md",
  "summary_path": "/path/to/cached/file_summary.md",  // if >100KB
  "from_cache": false,
  "original_size": 26744198,
  "converted_size": 129844,
  "summary_size": 30638,
  "savings_percent": 99.5,
  "recommendation": "summary"  // "summary" or "full"
}

Auto-Summary

Documents >100KB automatically get a summary version:

Version Purpose Size Target
Full Complete content As converted
Summary Quick overview ~30KB

The summary preserves:

  • All headers and structure
  • First portion of each section
  • Metadata and source reference

Automatic Integration

The smart-read-interceptor hook automatically triggers when you read:

  • PDF, Word, PowerPoint, Excel files
  • Any file >200KB

It will suggest:

  1. Use summary – If summary exists (best for overview)
  2. Use cache – If full cached version exists
  3. Convert first – If no cache exists
  4. Delegate – For very large files, use subagent

Subagent Delegation Pattern

For very large documents, delegate to isolated context:

Task(
  subagent_type="Explore",
  prompt="Read and summarize key points from: /path/to/large-file.pdf.
         Focus on: [specific topics]. Max 500 words summary."
)

This keeps the large content OUT of main context.

Cache Location

~/.claude/cache/documents/
├── filename_hash.md           # Full converted version
├── filename_hash_summary.md   # Summary (if >100KB)
└── ...

Cache expires after 1 week. Run --cleanup to remove old files.

Real-World Results

Document Original Converted Summary Savings
Google AI Guide (PDF) 26.7 MB 127 KB 30 KB 99.9%
Debatt (Word) 206 KB 5.4 KB 97%
Övning (PowerPoint) 7.2 MB 3.1 KB 99.96%

Workflow Examples

Reading a PDF for research

1. User asks to analyze a PDF
2. Hook detects: "📄 DOCUMENT FILE: .PDF"
3. Convert: python ~/.claude/lib/document-converter.py "file.pdf"
4. Read the summary for overview
5. Read specific sections from full version if needed

Processing multiple documents

1. Convert all documents first (batch):
   for f in *.pdf; do python ~/.claude/lib/document-converter.py "$f"; done

2. Read summaries in main context
3. Delegate deep analysis to subagents