form-filling
npx skills add https://github.com/sitammeur/datalab-skills --skill form-filling
Agent 安装分布
Skill 文档
Datalab Form Filling
Fill PDF and image forms using the Datalab Python SDK (datalab-python-sdk).
Prerequisites
pip install datalab-python-sdk python-dotenv
API Key Setup: The SDK requires DATALAB_API_KEY. Either:
- Set as environment variable:
export DATALAB_API_KEY=your_key - Or use a
.envfile in your project directory (recommended)
Workflow
- Gather field data from the user (field names, values, descriptions)
- Determine form source (local file, URL, or image)
- Configure options (context, confidence threshold, page range)
- Fill the form using the SDK
- Check results and handle unmatched fields
When NOT to Use This Skill
- Form creation – This fills existing forms, doesn’t create new ones
- OCR/text extraction – Use Datalab’s OCR endpoints instead
- Non-form documents – Regular PDFs without fillable fields or clear form structure
Quick Start
Use this in a script file (.py). In a notebook or REPL, __file__ is undefinedâuse explicit paths for the form and output instead.
import os
from pathlib import Path
from dotenv import load_dotenv
from datalab_sdk import DatalabClient, FormFillingOptions
# In a .py file: script_dir = Path(__file__).parent. In notebook/REPL: script_dir = Path(".")
script_dir = Path(__file__).parent
load_dotenv(script_dir / ".env")
client = DatalabClient(api_key=os.getenv("DATALAB_API_KEY"))
options = FormFillingOptions(
field_data={
"full_name": {"value": "John Doe", "description": "Full legal name"},
"date_of_birth": {"value": "1990-01-15", "description": "Date of birth"},
},
context="Employment application form",
confidence_threshold=0.5,
)
form_path = script_dir / "form.pdf"
result = client.fill(str(form_path), options=options)
result.save_output(str(script_dir / "filled_form.pdf"))
print(f"Filled: {result.fields_filled}")
print(f"Not found: {result.fields_not_found}")
Using the Fill Form Script
For quick command-line filling, use the bundled script. Run from the skill directory or use the full path:
# From skill directory (form.pdf and field_data.json in current dir)
python scripts/fill_form.py form.pdf field_data.json -o filled.pdf
# From another directory: use full paths for script, form, and field data
python /path/to/form-filling/scripts/fill_form.py /path/to/form.pdf /path/to/field_data.json -o filled.pdf
Options: -o output.pdf, -c "context string", -t 0.7 (threshold), -p "0-2" (pages 1-3, 0-indexed), --async
See scripts/sample_field_data.json for a template. The field_data.json format:
{
"name": { "value": "Jane Smith", "description": "Full name" },
"ssn": { "value": "123-45-6789", "description": "Social Security Number" }
}
Key Guidance
Field Data Design
- Always include
descriptionfor each field to improve matching accuracy - Use
contextto describe the form type (e.g., “IRS W-4 Employee’s Withholding Certificate”) - Field values are always strings, even for numbers and dates
Supported Field Types
Text, date, numeric, checkbox ("Yes"/"No"), and signature (rendered as text).
Handling Unmatched Fields
If result.fields_not_found is non-empty:
- Improve field descriptions to better match the form’s labels
- Add or refine the
contextparameter - Lower
confidence_thresholdto catch more matches
URL Source
result = client.fill(file_url="https://example.com/form.pdf", options=options)
Image Forms (Scanned PDFs, PNG, JPG)
The SDK handles image-based forms automatically:
# Scanned form or image file
result = client.fill("scanned_form.png", options=options)
result.save_output("filled_form.png") # Output matches input format
Async Processing
For batch operations or non-blocking calls. Paths are relative to the current working directory.
from datalab_sdk import AsyncDatalabClient, FormFillingOptions
async with AsyncDatalabClient(api_key=os.getenv("DATALAB_API_KEY")) as client:
result = await client.fill("form.pdf", options=options)
result.save_output("filled.pdf")
Common Pitfalls
API Key Not Found
Problem: DatalabAPIError: You must pass in an api_key or set DATALAB_API_KEY
Solution: The .env file isn’t auto-loaded. Always:
- Use
load_dotenv()with explicit path:load_dotenv(Path(__file__).parent / ".env") - Pass API key explicitly:
DatalabClient(api_key=os.getenv("DATALAB_API_KEY"))
File Not Found When Running Script
Problem: Relative paths like "form.pdf" fail when script runs from a different directory.
Solution: Use absolute paths based on script location:
script_dir = Path(__file__).parent
form_path = script_dir / "form.pdf"
result = client.fill(str(form_path), options=options)
Module Not Found
Problem: ModuleNotFoundError: No module named 'datalab_sdk'
Solution: Install the SDK first:
pip install datalab-python-sdk python-dotenv
References
- Full API details: See references/api-reference.md for installation/prerequisites, FormFillingOptions, confidence threshold tuning, image form handling, batch async patterns, result fields, error handling, and client configuration