docx
npx skills add https://github.com/yamato-snow/skills --skill docx
Agent 安装分布
Skill 文档
DOCXã®ä½æãç·¨éãåæ
æ¦è¦
ã¦ã¼ã¶ã¼ã¯.docxãã¡ã¤ã«ã®ä½æãç·¨éãã¾ãã¯å 容ã®åæãä¾é ¼ããå ´åãããã¾ãã.docxãã¡ã¤ã«ã¯æ¬è³ªçã«XMLãã¡ã¤ã«ã¨ãã®ä»ã®ãªã½ã¼ã¹ãå«ãZIPã¢ã¼ã«ã¤ãã§ãããèªã¿åããç·¨éãå¯è½ã§ããã¿ã¹ã¯ã«å¿ãã¦ç°ãªããã¼ã«ã¨ã¯ã¼ã¯ããã¼ãå©ç¨å¯è½ã§ãã
ã¯ã¼ã¯ããã¼æ±ºå®ããªã¼
ã³ã³ãã³ãã®èªã¿åã/åæ
ä¸è¨ã®ãããã¹ãæ½åºãã¾ãã¯ãRaw XMLã¢ã¯ã»ã¹ãã»ã¯ã·ã§ã³ã使ç¨
æ°è¦ããã¥ã¡ã³ãã®ä½æ
ãæ°è¦Wordããã¥ã¡ã³ãã®ä½æãã¯ã¼ã¯ããã¼ã使ç¨
æ¢åããã¥ã¡ã³ãã®ç·¨é
-
èªåã®ããã¥ã¡ã³ã + ç°¡åãªå¤æ´ ãåºæ¬çãªOOXMLç·¨éãã¯ã¼ã¯ããã¼ã使ç¨
-
ä»è ã®ããã¥ã¡ã³ã **ãã¬ããã©ã¤ã³ã¯ã¼ã¯ããã¼ã**ã使ç¨ï¼æ¨å¥¨ããã©ã«ãï¼
-
æ³åãå¦è¡ããã¸ãã¹ãã¾ãã¯æ¿åºææ¸ **ãã¬ããã©ã¤ã³ã¯ã¼ã¯ããã¼ã**ã使ç¨ï¼å¿ é ï¼
ã³ã³ãã³ãã®èªã¿åãã¨åæ
ããã¹ãæ½åº
ããã¥ã¡ã³ãã®ããã¹ãå 容ãèªã¿åãã ãã®å ´åã¯ãpandocã使ç¨ãã¦ããã¥ã¡ã³ããmarkdownã«å¤æãã¾ããPandocã¯ããã¥ã¡ã³ãæ§é ã®ä¿æã«åªãã¦ããã夿´å±¥æ´ã表示ã§ãã¾ãï¼
# 夿´å±¥æ´ä»ãã§ããã¥ã¡ã³ããmarkdownã«å¤æ
pandoc --track-changes=all path-to-file.docx -o output.md
# ãªãã·ã§ã³: --track-changes=accept/reject/all
Raw XMLã¢ã¯ã»ã¹
ã³ã¡ã³ããè¤éãªæ¸å¼è¨å®ãããã¥ã¡ã³ãæ§é ãåãè¾¼ã¿ã¡ãã£ã¢ãã¡ã¿ãã¼ã¿ã«ã¯Raw XMLã¢ã¯ã»ã¹ãå¿ è¦ã§ãããããã®æ©è½ã«ã¯ãããã¥ã¡ã³ããå±éãã¦Raw XMLã³ã³ãã³ããèªã¿åãå¿ è¦ãããã¾ãã
ãã¡ã¤ã«ã®å±é
python ooxml/scripts/unpack.py <office_file> <output_directory>
主è¦ãªãã¡ã¤ã«æ§é
word/document.xml– ã¡ã¤ã³ããã¥ã¡ã³ãã³ã³ãã³ãword/comments.xml– document.xmlã§åç §ãããã³ã¡ã³ãword/media/– åãè¾¼ã¾ããç»åã¨ã¡ãã£ã¢ãã¡ã¤ã«- 夿´å±¥æ´ã¯
<w:ins>ï¼æ¿å ¥ï¼ã¨<w:del>ï¼åé¤ï¼ã¿ã°ã使ç¨
æ°è¦Wordããã¥ã¡ã³ãã®ä½æ
æ°è¦Wordããã¥ã¡ã³ããã¼ããã使ããå ´åã¯ãJavaScript/TypeScriptã§Wordããã¥ã¡ã³ãã使ã§ããdocx-jsã使ç¨ãã¾ãã
ã¯ã¼ã¯ããã¼
- å¿
é – ãã¡ã¤ã«å
¨ä½ãèªã:
docx-js.mdï¼ç´500è¡ï¼ãæåããæå¾ã¾ã§å®å ¨ã«èªãã§ãã ããããã®ãã¡ã¤ã«ãèªãéã«ç¯å²å¶éãè¨å®ããªãã§ãã ããã ããã¥ã¡ã³ã使ãé²ããåã«ãè©³ç´°ãªæ§æãéè¦ãªæ¸å¼è¨å®ã«ã¼ã«ããã¹ããã©ã¯ãã£ã¹ã®ããã«ãã¡ã¤ã«å ¨ä½ãèªãã§ãã ããã - DocumentãParagraphãTextRunã³ã³ãã¼ãã³ãã使ç¨ãã¦JavaScript/TypeScriptãã¡ã¤ã«ã使ï¼ãã¹ã¦ã®ä¾åé¢ä¿ãã¤ã³ã¹ãã¼ã«ããã¦ããã¨ä»®å®ãã¾ãããã¤ã³ã¹ãã¼ã«ããã¦ããªãå ´åã¯ä¸è¨ã®ä¾åé¢ä¿ã»ã¯ã·ã§ã³ãåç §ï¼
- Packer.toBuffer()ã使ç¨ãã¦.docxã¨ãã¦ã¨ã¯ã¹ãã¼ã
æ¢åWordããã¥ã¡ã³ãã®ç·¨é
æ¢åã®Wordããã¥ã¡ã³ããç·¨éããå ´åã¯ãDocumentã©ã¤ãã©ãªï¼OOXMLæä½ç¨ã®Pythonã©ã¤ãã©ãªï¼ã使ç¨ãã¾ããã©ã¤ãã©ãªã¯ã¤ã³ãã©ã¹ãã©ã¯ãã£ã®ã»ããã¢ãããèªåçã«å¦çããããã¥ã¡ã³ãæä½ã®ããã®ã¡ã½ãããæä¾ãã¾ããè¤éãªã·ããªãªã§ã¯ãã©ã¤ãã©ãªãéãã¦åºç¤ã¨ãªãDOMã«ç´æ¥ã¢ã¯ã»ã¹ã§ãã¾ãã
ã¯ã¼ã¯ããã¼
- å¿
é – ãã¡ã¤ã«å
¨ä½ãèªã:
ooxml.mdï¼ç´600è¡ï¼ãæåããæå¾ã¾ã§å®å ¨ã«èªãã§ãã ããããã®ãã¡ã¤ã«ãèªãéã«ç¯å²å¶éãè¨å®ããªãã§ãã ããã ããã¥ã¡ã³ããã¡ã¤ã«ãç´æ¥ç·¨éããããã®Documentã©ã¤ãã©ãªAPIã¨XMLãã¿ã¼ã³ã«ã¤ãã¦ãã¡ã¤ã«å ¨ä½ãèªãã§ãã ããã - ããã¥ã¡ã³ããå±é:
python ooxml/scripts/unpack.py <office_file> <output_directory> - Documentã©ã¤ãã©ãªã使ç¨ãã¦Pythonã¹ã¯ãªããã使ã»å®è¡ï¼ooxml.mdã®ãDocumentã©ã¤ãã©ãªãã»ã¯ã·ã§ã³ãåç §ï¼
- æçµããã¥ã¡ã³ããããã¯:
python ooxml/scripts/pack.py <input_directory> <office_file>
Documentã©ã¤ãã©ãªã¯ä¸è¬çãªæä½ã®ããã®é«ã¬ãã«ã¡ã½ããã¨ãè¤éãªã·ããªãªã®ããã®ç´æ¥DOMã¢ã¯ã»ã¹ã®ä¸¡æ¹ãæä¾ãã¾ãã
ããã¥ã¡ã³ãã¬ãã¥ã¼ç¨ã®ã¬ããã©ã¤ã³ã¯ã¼ã¯ããã¼
ãã®ã¯ã¼ã¯ããã¼ã§ã¯ãOOXMLã§å®è£ ããåã«markdownã使ç¨ãã¦å æ¬çãªå¤æ´å±¥æ´ãè¨ç»ã§ãã¾ããéè¦: å®å ¨ãªå¤æ´å±¥æ´ã使ããã«ã¯ããã¹ã¦ã®å¤æ´ãä½ç³»çã«å®è£ ããå¿ è¦ãããã¾ãã
ãããæ¦ç¥: é¢é£ãã夿´ã3-10ã®å¤æ´ã®ãããã«ã°ã«ã¼ãåãã¾ããããã«ãããããã°ã管çãããããªããªããå¹çãç¶æã§ãã¾ããæ¬¡ã®ãããã«é²ãåã«åãããããã¹ããã¦ãã ããã
åå: æå°éã§æ£ç¢ºãªç·¨é
夿´å±¥æ´ãå®è£
ããéã¯ãå®éã«å¤æ´ãããããã¹ãã®ã¿ããã¼ã¯ãã¾ãã夿´ããã¦ããªãããã¹ããç¹°ãè¿ãã¨ç·¨éã®ã¬ãã¥ã¼ãé£ãããªãããããã§ãã·ã§ãã«ã§ãªãå°è±¡ãä¸ãã¾ããç½®æã次ã®ããã«åå²: [夿´ãªãããã¹ã] + [åé¤] + [æ¿å
¥] + [夿´ãªãããã¹ã]ãå
ã®<w:r>è¦ç´ ãæ½åºãã¦åå©ç¨ãããã¨ã§ã夿´ãªãããã¹ãã®å
ã®ã©ã³ã®RSIDãä¿æãã¾ãã
ä¾ – æä¸ã®ã30 daysããã60 daysãã«å¤æ´:
# æªãä¾ - æå
¨ä½ãç½®æ
'<w:del><w:r><w:delText>The term is 30 days.</w:delText></w:r></w:del><w:ins><w:r><w:t>The term is 60 days.</w:t></w:r></w:ins>'
# è¯ãä¾ - 夿´é¨åã®ã¿ããã¼ã¯ãã夿´ãªãããã¹ãã®å
ã®<w:r>ãä¿æ
'<w:r w:rsidR="00AB12CD"><w:t>The term is </w:t></w:r><w:del><w:r><w:delText>30</w:delText></w:r></w:del><w:ins><w:r><w:t>60</w:t></w:r></w:ins><w:r w:rsidR="00AB12CD"><w:t> days.</w:t></w:r>'
夿´å±¥æ´ã¯ã¼ã¯ããã¼
-
markdown表ç¾ãåå¾: 夿´å±¥æ´ãä¿æãã¦ããã¥ã¡ã³ããmarkdownã«å¤æ:
pandoc --track-changes=all path-to-file.docx -o current.md -
夿´ãç¹å®ãã¦ã°ã«ã¼ãå: ããã¥ã¡ã³ããã¬ãã¥ã¼ãã¦å¿ è¦ãªãã¹ã¦ã®å¤æ´ãç¹å®ããè«ççãªãããã«æ´ç:
å ´æç¹å®æ¹æ³ï¼XMLã§å¤æ´ãè¦ã¤ããããï¼:
- ã»ã¯ã·ã§ã³/è¦åºãçªå·ï¼ä¾: ãSection 3.2ãããArticle IVãï¼
- çªå·ä»ãã®å ´åã¯æ®µè½èå¥å
- ã¦ãã¼ã¯ãªå¨å²ããã¹ãã使ç¨ããgrepãã¿ã¼ã³
- ããã¥ã¡ã³ãæ§é ï¼ä¾: ãfirst paragraphãããsignature blockãï¼
- markdownã®è¡çªå·ã¯ä½¿ç¨ããªã – XMLæ§é ã«ãããã³ã°ãããªã
ãããæ´çï¼ãããããã3-10ã®é¢é£ãã夿´ãã°ã«ã¼ãåï¼:
- ã»ã¯ã·ã§ã³å¥: ãBatch 1: Section 2 amendmentsãããBatch 2: Section 5 updatesã
- ã¿ã¤ãå¥: ãBatch 1: Date correctionsãããBatch 2: Party name changesã
- è¤éãå¥: åç´ãªããã¹ãç½®æããå§ããè¤éãªæ§é 夿´ã«åãçµã
- é åºå¥: ãBatch 1: Pages 1-3ãããBatch 2: Pages 4-6ã
-
ããã¥ã¡ã³ããèªãã§å±é:
- å¿
é – ãã¡ã¤ã«å
¨ä½ãèªã:
ooxml.mdï¼ç´600è¡ï¼ãæåããæå¾ã¾ã§å®å ¨ã«èªãã§ãã ããããã®ãã¡ã¤ã«ãèªãéã«ç¯å²å¶éãè¨å®ããªãã§ãã ããã ç¹ã«ãDocumentã©ã¤ãã©ãªãã¨ã夿´å±¥æ´ãã¿ã¼ã³ãã»ã¯ã·ã§ã³ã«æ³¨æãã¦ãã ããã - ããã¥ã¡ã³ããå±é:
python ooxml/scripts/unpack.py <file.docx> <dir> - æ¨å¥¨RSIDã«æ³¨æ: unpackã¹ã¯ãªããã¯å¤æ´å±¥æ´ã«ä½¿ç¨ããRSIDãææ¡ãã¾ããã¹ããã4bã§ä½¿ç¨ããããã«ãã®RSIDãã³ãã¼ãã¦ãã ããã
- å¿
é – ãã¡ã¤ã«å
¨ä½ãèªã:
-
ãããã§å¤æ´ãå®è£ : 夿´ãè«ççã«ã°ã«ã¼ãåï¼ã»ã¯ã·ã§ã³å¥ãã¿ã¤ãå¥ãã¾ãã¯è¿æ¥æ§å¥ï¼ããåä¸ã®ã¹ã¯ãªããã§ã¾ã¨ãã¦å®è£ ãã¾ãããã®ã¢ããã¼ã:
- ãããã°ã容æã«ããï¼å°ããããã = ã¨ã©ã¼ã®åé¢ã容æï¼
- 段éçãªé²æãå¯è½ã«ãã
- å¹çãç¶æï¼3-10夿´ã®ããããµã¤ãºãé©åï¼
æ¨å¥¨ãããã°ã«ã¼ãå:
- ããã¥ã¡ã³ãã»ã¯ã·ã§ã³å¥ï¼ä¾: ãSection 3 changesãããDefinitionsãããTermination clauseãï¼
- 夿´ã¿ã¤ãå¥ï¼ä¾: ãDate changesãããParty name updatesãããLegal term replacementsãï¼
- è¿æ¥æ§å¥ï¼ä¾: ãChanges on pages 1-3ãããChanges in first half of documentãï¼
é¢é£ãã夿´ã®åãããã«ã¤ãã¦:
a. ããã¹ããXMLã«ãããã³ã°:
word/document.xmlã§ããã¹ããgrepãã¦ãããã¹ãã<w:r>è¦ç´ éã§ã©ã®ããã«åå²ããã¦ãããã確èªãb. ã¹ã¯ãªããã使ãã¦å®è¡:
get_nodeã使ç¨ãã¦ãã¼ããè¦ã¤ãã夿´ãå®è£ ããdoc.save()ãå®è¡ããã¿ã¼ã³ã«ã¤ãã¦ã¯ooxml.mdã®**ãDocumentã©ã¤ãã©ãªã**ã»ã¯ã·ã§ã³ãåç §ãæ³¨æ: ã¹ã¯ãªãããæ¸ãç´åã«å¸¸ã«
word/document.xmlãgrepãã¦ãç¾å¨ã®è¡çªå·ãåå¾ãããã¹ãå 容ã確èªãã¦ãã ãããè¡çªå·ã¯åã¹ã¯ãªããå®è¡å¾ã«å¤ããã¾ãã -
ããã¥ã¡ã³ããããã¯: ãã¹ã¦ã®ããããå®äºããããå±éããããã£ã¬ã¯ããªã.docxã«æ»ã:
python ooxml/scripts/pack.py unpacked reviewed-document.docx -
æçµæ¤è¨¼: å®å ¨ãªããã¥ã¡ã³ãã®å æ¬çãªãã§ãã¯ãè¡ã:
- æçµããã¥ã¡ã³ããmarkdownã«å¤æ:
pandoc --track-changes=all reviewed-document.docx -o verification.md - ãã¹ã¦ã®å¤æ´ãæ£ããé©ç¨ããããã¨ã確èª:
grep "original phrase" verification.md # è¦ã¤ãããªãã¯ã grep "replacement phrase" verification.md # è¦ã¤ããã¯ã - æå³ããªã夿´ãå°å ¥ããã¦ããªããã¨ã確èª
- æçµããã¥ã¡ã³ããmarkdownã«å¤æ:
ããã¥ã¡ã³ããç»åã«å¤æ
Wordããã¥ã¡ã³ããè¦è¦çã«åæããã«ã¯ã2段éã®ããã»ã¹ã§ç»åã«å¤æãã¾ã:
-
DOCXãPDFã«å¤æ:
soffice --headless --convert-to pdf document.docx -
PDFãã¼ã¸ãJPEGç»åã«å¤æ:
pdftoppm -jpeg -r 150 document.pdf pageããã«ãã
page-1.jpgãpage-2.jpgãªã©ã®ãã¡ã¤ã«ã使ããã¾ãã
ãªãã·ã§ã³:
-r 150: è§£å度ã150 DPIã«è¨å®ï¼å質/ãµã¤ãºã®ãã©ã³ã¹ã調æ´ï¼-jpeg: JPEGå½¢å¼ã§åºåï¼PNGã好ãå ´åã¯-pngã使ç¨ï¼-f N: 夿éå§ãã¼ã¸ï¼ä¾:-f 2ã§ãã¼ã¸2ããéå§ï¼-l N: 夿çµäºãã¼ã¸ï¼ä¾:-l 5ã§ãã¼ã¸5ã§åæ¢ï¼page: åºåãã¡ã¤ã«ã®ãã¬ãã£ãã¯ã¹
ç¹å®ç¯å²ã®ä¾:
pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page # ãã¼ã¸2-5ã®ã¿å¤æ
ã³ã¼ãã¹ã¿ã¤ã«ã¬ã¤ãã©ã¤ã³
éè¦: DOCXæä½ç¨ã®ã³ã¼ããçæããé:
- ç°¡æ½ãªã³ã¼ããæ¸ã
- åé·ãªå¤æ°åãéè¤ããæä½ãé¿ãã
- ä¸è¦ãªprintæãé¿ãã
ä¾åé¢ä¿
å¿ è¦ãªä¾åé¢ä¿ï¼å©ç¨ã§ããªãå ´åã¯ã¤ã³ã¹ãã¼ã«ï¼:
- pandoc:
sudo apt-get install pandocï¼ããã¹ãæ½åºç¨ï¼ - docx:
npm install -g docxï¼æ°è¦ããã¥ã¡ã³ã使ç¨ï¼ - LibreOffice:
sudo apt-get install libreofficeï¼PDF夿ç¨ï¼ - Poppler:
sudo apt-get install poppler-utilsï¼pdftoppmã§PDFãç»åã«å¤æï¼ - defusedxml:
pip install defusedxmlï¼å®å ¨ãªXMLè§£æç¨ï¼