pymupdf-pdf

📁 kesslerio/pymupdf-pdf-parser-clawdbot-skill 📅 1 day ago

总安装量

周安装量

#66299

全站排名

安装命令

npx skills add https://github.com/kesslerio/pymupdf-pdf-parser-clawdbot-skill --skill pymupdf-pdf

Agent 安装分布

amp 1

openclaw 1

opencode 1

codex 1

github-copilot 1

Skill 文档

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

--format md|json|both (default: md)
--images to extract images
--tables to extract a simple line-based table JSON (quick/rough)
--outroot DIR to change output root
--lang adds a language hint into JSON output metadata

Output conventions

Create ./pymupdf-output/<pdf-basename>/ by default.
Markdown output: output.md
JSON output: output.json (includes lang)
Images: images/ subdir
Tables: tables.json (rough line-based)

Notes

PyMuPDF is fast but less robust on complex PDFs.
For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台