pdf-to-markdown

📁 duc01226/easyplatform 📅 Jan 24, 2026

127

总安装量

周安装量

#3493

全站排名

安装命令

npx skills add https://github.com/duc01226/easyplatform --skill pdf-to-markdown

Agent 安装分布

opencode 42

gemini-cli 39

codex 37

claude-code 32

github-copilot 31

cursor 26

Skill 文档

pdf-to-markdown

Convert PDF files to Markdown format.

Installation Required

cd .claude/skills/pdf-to-markdown
npm install

Dependencies: pdf-parse

Quick Start

# Basic conversion
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
  --file ./document.pdf

# Custom output path
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
  --file ./doc.pdf \
  --output ./output/doc.md

CLI Options

Option	Required	Description
`--file <path>`	Yes	Input PDF file
`--output <path>`	No	Output Markdown path (default: input name + .md)

Output Format (JSON)

{
  "success": true,
  "input": "/path/to/input.pdf",
  "output": "/path/to/output.md",
  "wordCount": 1523,
  "warnings": ["Tables may not be accurately converted"]
}

Supported Elements

Text extraction from digital PDFs
Headings (detected by font size heuristics)
Paragraphs
Basic lists
Links (when embedded in PDF)

Known Limitations

Tables: Very limited support; may not render correctly
Multi-column layouts: Text may interleave between columns
Scanned PDFs: NOT supported (requires OCR – see alternatives below)
Images: NOT extracted (PDF images are not included in output)
Complex formatting: May be simplified or lost
Password-protected PDFs: NOT supported

Alternatives for Unsupported Cases

For scanned PDFs (OCR needed):

Use scribe.js-ocr library (AGPL license)
Commercial OCR services (Google Cloud Vision, AWS Textract)

For complex tables:

Consider AI-based extraction (LLM post-processing)
Manual review and correction

For image extraction:

Use unpdf library with sharp for image extraction
Process images separately and reference in markdown

Troubleshooting

Dependencies not found: Run npm install in skill directory Empty output: PDF may be scanned/image-based (requires OCR) Garbled text: PDF may use embedded fonts not supported by parser Memory issues: Large PDFs may require --max-old-space-size=4096 flag

IMPORTANT Task Planning Notes

Always plan and break many small todo tasks
Always add a final review todo task to review the works done at the end to find any fix or enhancement needed

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台