doc2x-ocr-markdown
1
总安装量
1
周安装量
#50337
全站排名
安装命令
npx skills add https://github.com/jysd-ai/skills --skill doc2x-ocr-markdown
Agent 安装分布
windsurf
1
amp
1
opencode
1
kimi-cli
1
codex
1
Skill 文档
Doc2X OCR Markdown
Overview
Convert a single PDF or image into Markdown and extract image assets with one local script:
scripts/doc2x_ocr.py
Require only one credential:
DOC2X_APIKEY
Quick Start
Set API key:
export DOC2X_APIKEY='sk-...'
Run PDF OCR to Markdown + images:
python scripts/doc2x_ocr.py pdf ./input.pdf --outdir ./output
Run image OCR to Markdown + images:
python scripts/doc2x_ocr.py image ./page.png --outdir ./output
Workflow
- Validate
DOC2X_APIKEY. - Choose conversion mode from input file type.
- Run
scripts/doc2x_ocr.py. - Return output folder and generated Markdown path.
Modes
PDF Mode
Use the asynchronous Doc2X PDF flow:
POST /api/v2/parse/preuploadPUTfile bytes to returned upload URL- Poll
GET /api/v2/parse/status - Trigger export
POST /api/v2/convert/parse(to=md) - Poll
GET /api/v2/convert/parse/result - Download zip, extract files, locate Markdown
Useful options:
--formula-mode dollar|normal(defaultdollar)--merge-cross-page-forms--poll-interval--timeout--keep-zip
Image Mode
Use synchronous image layout OCR:
POST /api/v2/parse/img/layoutwith binary image body- Write page Markdown from response
- If
convert_zipexists, decode and extract image resources
Output Contract
For input <name>.pdf or <name>.png, script writes:
<outdir>/<name>/...extracted files<outdir>/<name>/<name>.mdif no Markdown file exists in extracted content
Script prints a JSON summary with:
modeuidoutput_dirmarkdownzip(only when--keep-zip)
References
Read these files when you need deeper context:
references/api-quick-reference.mdfor endpoint behavior and limitsreferences/implementation-notes.mdfor relation to the copied officialdoc2x.py
Troubleshooting
- Handle
parse_task_limit_exceededorparse_concurrency_limitby reducing concurrent jobs and retrying later. - Split huge PDFs if parse timeout or page-limit errors occur.
- Keep poll interval between 1 and 3 seconds for status APIs unless there is a strong reason to change.
- Save outputs promptly because official docs state cloud parse results are temporary.