docx
npx skills add https://github.com/sherifeldeeb/agentskills --skill docx
Agent 安装分布
Skill 文档
DOCX Skill
Read, modify, and create Microsoft Word documents with support for converting markdown content into professionally formatted documents using templates.
Capabilities
- Read Documents: Extract text, tables, and metadata from DOCX files
- Create Documents: Generate new DOCX files from scratch or templates
- Modify Documents: Edit existing documents, add content, modify styles
- Markdown Conversion: Convert markdown files to DOCX using template styling
- Template-Based Generation: Use DOCX templates to maintain consistent formatting
Quick Start
from docx import Document
# Read a document
doc = Document('report.docx')
for para in doc.paragraphs:
print(para.text)
# Create a document
doc = Document()
doc.add_heading('Report Title', 0)
doc.add_paragraph('Content here.')
doc.save('output.docx')
Usage
Reading Documents
Extract content from existing DOCX files.
Input: Path to a DOCX file
Process:
- Open document with python-docx
- Iterate through paragraphs, tables, or other elements
- Extract text, styles, or metadata
Example:
from docx import Document
doc = Document('input.docx')
# Extract all text
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
text = '\n'.join(full_text)
# Extract tables
for table in doc.tables:
for row in table.rows:
row_data = [cell.text for cell in row.cells]
print(row_data)
# Get document properties
props = doc.core_properties
print(f"Title: {props.title}")
print(f"Author: {props.author}")
Creating New Documents
Generate DOCX files from scratch.
Input: Content to include (text, tables, images)
Process:
- Create new Document object
- Add headings, paragraphs, tables
- Apply styles as needed
- Save to file
Example:
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
# Add title
title = doc.add_heading('Security Assessment Report', 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
# Add metadata paragraph
doc.add_paragraph('Prepared by: Security Team')
doc.add_paragraph('Date: January 2024')
# Add section heading
doc.add_heading('Executive Summary', level=1)
# Add content paragraph
para = doc.add_paragraph()
para.add_run('This assessment identified ').bold = False
para.add_run('3 critical findings').bold = True
para.add_run(' requiring immediate attention.')
# Add a table
table = doc.add_table(rows=1, cols=3)
table.style = 'Table Grid'
header = table.rows[0].cells
header[0].text = 'Finding'
header[1].text = 'Severity'
header[2].text = 'Status'
# Add data rows
data = [
('SQL Injection', 'Critical', 'Open'),
('XSS Vulnerability', 'High', 'In Progress'),
]
for finding, severity, status in data:
row = table.add_row().cells
row[0].text = finding
row[1].text = severity
row[2].text = status
doc.save('report.docx')
Modifying Existing Documents
Edit and update existing DOCX files.
Input: Path to existing DOCX file
Process:
- Open existing document
- Locate content to modify
- Make changes
- Save (same or new file)
Example:
from docx import Document
doc = Document('template.docx')
# Replace placeholder text
for para in doc.paragraphs:
if '{{CLIENT_NAME}}' in para.text:
para.text = para.text.replace('{{CLIENT_NAME}}', 'Acme Corp')
if '{{DATE}}' in para.text:
para.text = para.text.replace('{{DATE}}', '2024-01-15')
# Add new section at end
doc.add_heading('Additional Findings', level=1)
doc.add_paragraph('New content added during modification.')
doc.save('modified_report.docx')
Markdown to DOCX Conversion
Convert markdown files to Word documents using template styling.
Input:
- Markdown file path
- DOCX template file path (optional)
- Output file path
Process:
- Parse markdown content
- Load template document for styles (if provided)
- Map markdown elements to Word styles
- Generate formatted DOCX
Using the conversion script:
python scripts/md_to_docx.py report.md --template company_template.docx --output final_report.docx
Programmatic usage:
from scripts.md_to_docx import MarkdownToDocx
converter = MarkdownToDocx(template_path='template.docx')
converter.convert('input.md', 'output.docx')
Style Mapping:
| Markdown | Word Style |
|---|---|
# Heading 1 |
Heading 1 |
## Heading 2 |
Heading 2 |
### Heading 3 |
Heading 3 |
**bold** |
Bold run |
*italic* |
Italic run |
`code` |
Code character style |
- item |
List Bullet |
1. item |
List Number |
| Code blocks | Code block style |
> quote |
Quote style |
| Tables | Table Grid |
Template Requirements:
For best results, your template should define these styles:
- Heading 1, Heading 2, Heading 3
- Normal (body text)
- List Bullet, List Number
- Quote (for blockquotes)
- Code (character style for inline code)
Working with Styles
Apply consistent formatting using document styles.
Example:
from docx import Document
from docx.shared import Pt, RGBColor
from docx.enum.style import WD_STYLE_TYPE
doc = Document()
# Create a custom style
styles = doc.styles
style = styles.add_style('Finding Critical', WD_STYLE_TYPE.PARAGRAPH)
style.font.bold = True
style.font.size = Pt(12)
style.font.color.rgb = RGBColor(255, 0, 0)
# Use the style
doc.add_paragraph('CRITICAL: SQL Injection Found', style='Finding Critical')
doc.save('styled_report.docx')
Adding Images
Insert images into documents.
Example:
from docx import Document
from docx.shared import Inches
doc = Document()
doc.add_heading('Network Diagram', level=1)
doc.add_picture('network_diagram.png', width=Inches(6))
doc.add_paragraph('Figure 1: Current network architecture')
doc.save('report_with_images.docx')
Configuration
Environment Variables
| Variable | Description | Required | Default |
|---|---|---|---|
DOCX_TEMPLATE_DIR |
Default template directory | No | ./assets/templates |
Script Options
| Option | Type | Description |
|---|---|---|
--template |
path | DOCX template for styling |
--output |
path | Output file path |
--toc |
flag | Generate table of contents |
--verbose |
flag | Enable verbose logging |
Examples
Example 1: Security Report from Markdown
Scenario: Convert a penetration test report written in markdown to a professional Word document.
Input (report.md):
# Penetration Test Report
## Executive Summary
The assessment identified **3 critical** and **5 high** severity findings.
## Findings
### Finding 1: SQL Injection
**Severity**: Critical
The application is vulnerable to SQL injection in the login form.
| Parameter | Payload | Result |
|-----------|---------|--------|
| username | `' OR 1=1--` | Auth bypass |
### Remediation
1. Use parameterized queries
2. Implement input validation
3. Apply least privilege
Command:
python scripts/md_to_docx.py report.md --template corporate_template.docx --output pentest_report.docx
Output: Professional DOCX with corporate styling applied.
Example 2: Batch Document Generation
Scenario: Generate multiple reports from a template with different data.
from docx import Document
import json
# Load client data
with open('clients.json') as f:
clients = json.load(f)
for client in clients:
doc = Document('assessment_template.docx')
# Replace placeholders
for para in doc.paragraphs:
for key, value in client.items():
placeholder = f'{{{{{key}}}}}'
if placeholder in para.text:
para.text = para.text.replace(placeholder, str(value))
doc.save(f"reports/{client['name']}_assessment.docx")
Limitations
- Complex formatting: Some advanced Word features (SmartArt, embedded objects) may not be fully supported
- Track changes: Limited support for revision tracking
- Macros: VBA macros are not supported for security reasons
- Large tables: Tables with merged cells may have rendering inconsistencies
Troubleshooting
Style Not Applied
Problem: Custom styles from template not appearing in output
Solution: Ensure the style exists in the template and use exact style name:
# Check available styles
for style in doc.styles:
print(style.name)
Encoding Issues
Problem: Special characters not displaying correctly
Solution: Ensure UTF-8 encoding when reading source files:
with open('input.md', 'r', encoding='utf-8') as f:
content = f.read()
Missing Images
Problem: Images not appearing in output
Solution: Use absolute paths or verify relative path from script location:
from pathlib import Path
image_path = Path(__file__).parent / 'assets' / 'logo.png'
doc.add_picture(str(image_path))
Related Skills
- pdf: Convert DOCX to PDF or extract content from PDFs
- xlsx: Work with Excel files for data that feeds into reports
- pptx: Create presentations from document content