docx
npx skills add https://github.com/kunhai-88/skills --skill docx
Agent 安装分布
Skill 文档
DOCX å建ãç¼è¾ä¸åæ
æ¦è¿°
.docx æ¬è´¨ä¸º ZIPï¼å å« XML çèµæºãæä»»å¡ç±»åéæ©ä¸å工使µã
工使µéæ©
- ä» é 读/åæ â ææ¬æåæåå§ XML 访é®
- æ°å»ºææ¡£ â ãåå»ºæ° Word ææ¡£ãæµç¨
- ç¼è¾å·²æææ¡£ï¼
- èªå·±çææ¡£ + ç®åä¿®æ¹ â åºç¡ OOXML ç¼è¾
- ä»äººææ¡£ / æ³å¾ã妿¯ãåä¸ãæ¿åºææ¡£ â 修订æµç¨ï¼Redliningï¼ï¼æ¨èæå¿ é¡»ï¼
é 读ä¸åæ
ææ¬æå
ç¨ pandoc 转为 markdownï¼å¯ä¿ç修订ï¼
pandoc --track-changes=all path-to-file.docx -o output.md
# é项: --track-changes=accept/reject/all
åå§ XML
æ¹æ³¨ãå¤ææ ¼å¼ãç»æãåªä½ãå
æ°æ®éè§£å
å读 XMLã
è§£å
ï¼python ooxml/scripts/unpack.py <office_file> <output_directory>
å
³é®è·¯å¾ï¼word/document.xmlãword/comments.xmlãword/media/ï¼ä¿®è®¢ç¨ <w:ins>ã<w:del>ã
åå»ºæ°ææ¡£
ä½¿ç¨ docx-jsï¼JavaScript/TypeScriptï¼ãå
宿´é
读 docx-js.mdï¼å以 Document / Paragraph / TextRun æå»ºï¼ç¨ Packer.toBuffer() å¯¼åº .docxã
ç¼è¾å·²æææ¡£
ä½¿ç¨ Document åºï¼Pythonï¼æä½ OOXMLï¼ãæµç¨ï¼
- 宿´é 读 ooxml.md
- è§£å
ï¼
unpack.py <office_file> <output_directory> - ç¨ Document åºç¼åèæ¬ç¼è¾
- æå
ï¼
pack.py <input_directory> <office_file>
修订æµç¨ï¼Redliningï¼
- markdown 表示ï¼
pandoc --track-changes=all ... -o current.md - è¯å«å¹¶åæ¹ä¿®æ¹ï¼æç« è/ç±»å/é¾åº¦åç»ï¼æ¯æ¹çº¦ 3ï½10 å¤ã
- è§£å ãé 读 ooxml.mdï¼æå»ºè®® RSID 使ç¨ã
- åæ¹å®ç°ï¼
grepå®ä½word/document.xmlï¼ç¨get_nodeçå®ç°åæ´ï¼doc.save()ã - æå
ï¼
pack.pyçæ .docxã - éªè¯ï¼å次
pandoc --track-changes=all转 mdï¼grepæ ¸å¯¹ä¿®æ¹æ¯å¦å®æ´ãæ å¤ä½åæ´ã
ååï¼ä»
æ è®°å®é
åæ´çææ¬ï¼æªæ¹é¨åå¤ç¨å <w:r> ä¸ RSIDã
转æå¾ç
soffice --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
# -f / -l 坿å®é¡µèå´
ä¾èµ
pandocãdocxï¼npmï¼ãLibreOfficeãpoppler-utilsãdefusedxmlï¼pipï¼ã