file-to-markdown
npx skills add https://github.com/bendusy/file-to-markdown --skill file-to-markdown
Agent 安装分布
Skill 文档
æä»¶è½¬ Markdown
å°ä»»ææä»¶è½¬æ¢ä¸ºç»æå Markdownï¼å¤å·¥å
·èªå¨éä¼ãææèæ¬ä½¿ç¨ PEP 723 inline metadataï¼éè¿ uv run èªå¨è§£æä¾èµï¼æ éæå¨å®è£
ã
åç½®æ¡ä»¶
éè¦ uv å 管çå¨ï¼
curl -LsSf https://astral.sh/uv/install.sh | sh
æ£æ¥ç¯å¢ç¶æï¼
uv run scripts/setup.py status
å¿«éå¼å§
åæä»¶è½¬æ¢
uv run scripts/convert.py input.pdf -o output.md
æå®å·¥å ·ï¼
uv run scripts/convert.py input.pdf --tool docling -o output.md
强å¶ä½¿ç¨æå·¥å ·ï¼ä¸åéï¼ï¼
uv run scripts/convert.py input.pdf --force-tool marker -o output.md
æ¹é转æ¢
uv run scripts/batch_convert.py /path/to/docs/ -o /path/to/output/
å¸¦è¿æ»¤åå¹¶è¡ï¼
uv run scripts/batch_convert.py /path/to/docs/ -o output/ --extensions .pdf .docx --workers 4
è´¨éæ£æ¥
uv run scripts/quality_check.py output.md
uv run scripts/quality_check.py /path/to/output/ # æ¹éæ£æ¥
å·¥å ·éæ©çç¥
ç³»ç»æ ¹æ®æä»¶æ ¼å¼èªå¨éæ©æä¼å·¥å ·é¾ï¼
| æä»¶ç±»å | é¦é â å¤é |
|---|---|
| PDF (ç®å) | Docling â Marker â PyMuPDF4LLM â MarkItDown |
| DOCX/DOC | MarkItDown â Docling â Pandoc |
| PPTX | MarkItDown â Marker |
| XLSX/CSV | MarkItDown |
| HTML | MarkItDown â Docling |
| Images | MarkItDown(+LLM) â Docling |
| EPUB | MarkItDown â Marker |
| Audio | MarkItDown(+LLM) |
| RST/LaTeX/RTF | Pandoc |
è¥é¦éå·¥å ·å¤±è´¥ææªå®è£ ï¼èªå¨åéå°ä¸ä¸ä¸ªå¯ç¨å·¥å ·ã
å·¥å ·è¯¦ç»å¯¹æ¯: è§ references/tool-comparison.md
Python API è°ç¨
å¨å ¶ä»èæ¬ä¸ç´æ¥è°ç¨ï¼
import sys
sys.path.insert(0, 'scripts')
from convert import convert_file
result = convert_file('document.pdf', output_path='output.md')
if result.success:
print(f"è½¬æ¢æåï¼ä½¿ç¨å·¥å
·: {result.tool_used}, èæ¶: {result.elapsed_seconds:.1f}s")
print(result.content[:500])
æ¹é转æ¢ï¼
from batch_convert import batch_convert
batch_result = batch_convert(
'/path/to/docs',
output_dir='/path/to/output',
extensions={'.pdf', '.docx'},
max_workers=4
)
print(f"æå: {batch_result.succeeded}/{batch_result.total}")
è´¨éæ£æ¥ï¼
from quality_check import check_quality
report = check_quality(md_content, source_path='original.pdf')
print(f"è´¨éè¯å: {report.score}/100 ({report.grade})")
ç¹æ®åºæ¯æå
PDF å«å¤æè¡¨æ ¼
ä¼å Doclingï¼IBM AI è¡¨æ ¼è¯å«ï¼ï¼
uv run scripts/convert.py report.pdf --tool docling
PDF 嫿°å¦å ¬å¼
ä¼å Markerï¼LaTeX å ¬å¼æåï¼ï¼
uv run scripts/convert.py paper.pdf --tool marker
éè¦æå¿«é度
ä¼å PyMuPDF4LLMï¼
uv run scripts/convert.py large.pdf --tool pymupdf4llm
LLM å¢å¼ºå¾åæè¿°
MarkItDown å¯éæ OpenAI/Azure LLM è¿è¡å¾åå 容æè¿°ï¼éå¨ convert.py ä¸é ç½® llm_clientï¼ã
注æäºé¡¹
- ä¾èµç®¡ç: ææèæ¬ä½¿ç¨ PEP 723 inline metadataï¼
uv runèªå¨è§£æä¾èµï¼æ éæå¨pip install - 许å¯è¯: MarkItDown å Docling 为 MITï¼èªç±åç¨ï¼ï¼Marker 为 GPL-3.0ï¼PyMuPDF4LLM 为 AGPL-3.0ï¼åç¨é注æï¼
- GPU: Marker å Docling 卿 GPU æ¶è¡¨ç°æ´å¥½ï¼CPU ä¹å¯è¿è¡ä½è¾æ ¢
- 大æä»¶: è¶ è¿ 100MB çæä»¶å»ºè®®ä½¿ç¨ PyMuPDF4LLMï¼å åæçæé«ï¼
- 䏿 PDF: Docling å Marker å¯¹ä¸ææ¯æè¾å¥½