nutrient-document-processing
npx skills add https://github.com/pspdfkit-labs/nutrient-agent-skill --skill nutrient-document-processing
Agent 安装分布
Skill 文档
Nutrient Document Processing
Process, convert, extract, redact, sign, and manipulate documents using the Nutrient DWS Processor API.
Setup
You need a Nutrient DWS API key. Get one free at https://dashboard.nutrient.io/sign_up/?product=processor.
Option 1: MCP Server (Recommended)
If your agent supports MCP (Model Context Protocol), use the Nutrient DWS MCP Server. It provides all operations as native tools.
Configure your MCP client (e.g., claude_desktop_config.json or .mcp.json):
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}
Then use the MCP tools directly (e.g., convert_to_pdf, extract_text, redact, etc.).
Option 2: Direct API (curl)
For agents without MCP support, call the API directly:
export NUTRIENT_API_KEY="your_api_key_here"
All requests go to https://api.nutrient.io/build as multipart POST with an instructions JSON field.
Operations
1. Convert Documents
Convert between PDF, DOCX, XLSX, PPTX, HTML, and image formats.
HTML to PDF:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "index.html=@index.html" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdf
DOCX to PDF:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.docx=@document.docx" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdf
PDF to DOCX/XLSX/PPTX:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docx
Image to PDF:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "image.jpg=@image.jpg" \
-F 'instructions={"parts":[{"file":"image.jpg"}]}' \
-o output.pdf
2. Extract Text and Data
Extract plain text:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txt
Extract tables (as JSON, CSV, or Excel):
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsx
Extract key-value pairs:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
-o result.json
3. OCR Scanned Documents
Apply OCR to scanned PDFs or images, producing searchable PDFs with selectable text.
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "scanned.pdf=@scanned.pdf" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdf
Supported languages: english, german, french, spanish, italian, portuguese, dutch, swedish, danish, norwegian, finnish, polish, czech, turkish, japanese, korean, chinese-simplified, chinese-traditional, arabic, hebrew, thai, hindi, russian, and more.
4. Redact Sensitive Information
Pattern-based redaction (preset patterns):
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
-o redacted.pdf
Available presets: social-security-number, credit-card-number, email-address, north-american-phone-number, international-phone-number, date, url, ipv4, ipv6, mac-address, us-zip-code, vin, time.
Regex-based redaction:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
-o redacted.pdf
AI-powered PII redaction:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
-o redacted.pdf
The criteria field accepts natural language (e.g., “Names and phone numbers”, “Protected health information”, “Financial account numbers”).
5. Add Watermarks
Text watermark:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
-o watermarked.pdf
Image watermark:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F "logo.png=@logo.png" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
-o watermarked.pdf
6. Digital Signatures
Sign a PDF with CMS signature:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
-o signed.pdf
Sign with CAdES-B-LT (long-term validation):
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
-o signed.pdf
7. Form Filling (Instant JSON)
Fill PDF form fields using Instant JSON format:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
-o filled.pdf
8. Merge and Split PDFs
Merge multiple PDFs:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "doc1.pdf=@doc1.pdf" \
-F "doc2.pdf=@doc2.pdf" \
-F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
-o merged.pdf
Extract specific pages:
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
-o pages1-5.pdf
9. Render PDF Pages as Images
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
-o page1.png
10. Check Credits
curl -X GET https://api.nutrient.io/credits \
-H "Authorization: Bearer $NUTRIENT_API_KEY"
Best Practices
- Use the MCP server when your agent supports it â it handles file I/O, error handling, and sandboxing automatically.
- Set
SANDBOX_PATHto restrict file access to a specific directory. - Check credit balance before batch operations to avoid interruptions.
- Use AI redaction for complex PII detection; use preset/regex redaction for known patterns (faster, cheaper).
- Chain operations â the API supports multiple actions in a single call (e.g., OCR then redact).
Troubleshooting
| Issue | Solution |
|---|---|
| 401 Unauthorized | Check your API key is valid and has credits |
| 413 Payload Too Large | Files must be under 100 MB |
| Slow AI redaction | AI analysis takes 60â120 seconds; this is normal |
| OCR quality poor | Try a different language parameter or improve scan quality |
| Missing text in extraction | Run OCR first on scanned documents |
More Information
- Full API reference â Detailed endpoints, parameters, and error codes
- API Playground â Interactive API testing
- API Documentation â Official guides
- MCP Server repo â Source code and issues