pymupdf-pdf
1
总安装量
1
周安装量
#45130
全站排名
安装命令
npx skills add https://github.com/kesslerio/pymupdf-pdf-parser-clawdbot-skill --skill pymupdf-pdf
Agent 安装分布
amp
1
openclaw
1
opencode
1
codex
1
github-copilot
1
Skill 文档
PyMuPDF PDF
Overview
Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.
Prereqs / when to read references
If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:
references/pymupdf-notes.md
Quick start (single PDF)
# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
--format md \
--outroot ./pymupdf-output
Options
--format md|json|both(default:md)--imagesto extract images--tablesto extract a simple line-based table JSON (quick/rough)--outroot DIRto change output root--langadds a language hint into JSON output metadata
Output conventions
- Create
./pymupdf-output/<pdf-basename>/by default. - Markdown output:
output.md - JSON output:
output.json(includeslang) - Images:
images/subdir - Tables:
tables.json(rough line-based)
Notes
- PyMuPDF is fast but less robust on complex PDFs.
- For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.