pdf-to-markdown
60
总安装量
60
周安装量
#3658
全站排名
安装命令
npx skills add https://github.com/duc01226/easyplatform --skill pdf-to-markdown
Agent 安装分布
opencode
42
gemini-cli
39
codex
37
claude-code
32
github-copilot
31
cursor
26
Skill 文档
pdf-to-markdown
Convert PDF files to Markdown format.
Installation Required
cd .claude/skills/pdf-to-markdown
npm install
Dependencies: pdf-parse
Quick Start
# Basic conversion
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
--file ./document.pdf
# Custom output path
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
--file ./doc.pdf \
--output ./output/doc.md
CLI Options
| Option | Required | Description |
|---|---|---|
--file <path> |
Yes | Input PDF file |
--output <path> |
No | Output Markdown path (default: input name + .md) |
Output Format (JSON)
{
"success": true,
"input": "/path/to/input.pdf",
"output": "/path/to/output.md",
"wordCount": 1523,
"warnings": ["Tables may not be accurately converted"]
}
Supported Elements
- Text extraction from digital PDFs
- Headings (detected by font size heuristics)
- Paragraphs
- Basic lists
- Links (when embedded in PDF)
Known Limitations
- Tables: Very limited support; may not render correctly
- Multi-column layouts: Text may interleave between columns
- Scanned PDFs: NOT supported (requires OCR – see alternatives below)
- Images: NOT extracted (PDF images are not included in output)
- Complex formatting: May be simplified or lost
- Password-protected PDFs: NOT supported
Alternatives for Unsupported Cases
For scanned PDFs (OCR needed):
- Use
scribe.js-ocrlibrary (AGPL license) - Commercial OCR services (Google Cloud Vision, AWS Textract)
For complex tables:
- Consider AI-based extraction (LLM post-processing)
- Manual review and correction
For image extraction:
- Use
unpdflibrary withsharpfor image extraction - Process images separately and reference in markdown
Troubleshooting
Dependencies not found: Run npm install in skill directory
Empty output: PDF may be scanned/image-based (requires OCR)
Garbled text: PDF may use embedded fonts not supported by parser
Memory issues: Large PDFs may require --max-old-space-size=4096 flag
IMPORTANT Task Planning Notes
- Always plan and break many small todo tasks
- Always add a final review todo task to review the works done at the end to find any fix or enhancement needed