x-convert-pdf-to-markdown
2
总安装量
2
周安装量
#70048
全站排名
安装命令
npx skills add https://github.com/arda-industries/agent-skills --skill x-convert-pdf-to-markdown
Agent 安装分布
trae
2
gemini-cli
2
github-copilot
2
codex
2
kimi-cli
2
cursor
2
Skill 文档
Two tools are available depending on your needs:
| Tool | Best For | Speed | Size |
|---|---|---|---|
| pymupdf | Simple text PDFs | Very fast (~12s for 7 files) | ~15MB |
| marker-pdf | Complex PDFs with tables, images, OCR | Slow | ~2GB models |
Setup
Both tools are installed in the agent-instructions poetry environment:
cd ~/brain/git/personal/agent-instructions
poetry install # if not already done
PyMuPDF (Recommended for text-only PDFs)
Fast and lightweight. Use this for most PDFs.
Single File
cd ~/brain/git/personal/agent-instructions
poetry run pymupdf gettext -mode layout -output "/path/to/output.md" "/path/to/file.pdf"
Batch Conversion
cd ~/brain/git/personal/agent-instructions
for pdf in /path/to/pdfs/*.pdf; do
name=$(basename "$pdf" .pdf)
poetry run pymupdf gettext -mode layout -output "/path/to/output/${name}.md" "$pdf"
done
Options
| Option | Description |
|---|---|
-mode |
simple, blocks, or layout (default: layout preserves formatting) |
-output |
Output file path |
-pages |
Page range to extract |
marker-pdf (For complex PDFs)
Use when you need OCR, table extraction, or image handling.
Single File
cd ~/brain/git/personal/agent-instructions
poetry run marker_single "/path/to/file.pdf" --output_dir "/path/to/output"
Options
| Option | Description |
|---|---|
--output_dir |
Directory to save output |
--output_format |
markdown, json, html, or chunks |
--page_range |
Process specific pages, e.g., "0,5-10,20" |
--force_ocr |
Force OCR on all text |
First Run
On first use, marker downloads ML models (~2GB). This happens once.
Notes
- Fully local: Both tools process entirely on your machine, no cloud
- PyMuPDF: Best for clean, text-based PDFs
- marker-pdf: Best for scanned docs, tables, or complex layouts