form-filling

📁 sitammeur/datalab-skills 📅 9 days ago
1
总安装量
2
周安装量
#52434
全站排名
安装命令
npx skills add https://github.com/sitammeur/datalab-skills --skill form-filling

Agent 安装分布

replit 1
opencode 1
cursor 1
codex 1
claude-code 1
gemini-cli 1

Skill 文档

Datalab Form Filling

Fill PDF and image forms using the Datalab Python SDK (datalab-python-sdk).

Prerequisites

pip install datalab-python-sdk python-dotenv

API Key Setup: The SDK requires DATALAB_API_KEY. Either:

  • Set as environment variable: export DATALAB_API_KEY=your_key
  • Or use a .env file in your project directory (recommended)

Workflow

  1. Gather field data from the user (field names, values, descriptions)
  2. Determine form source (local file, URL, or image)
  3. Configure options (context, confidence threshold, page range)
  4. Fill the form using the SDK
  5. Check results and handle unmatched fields

When NOT to Use This Skill

  • Form creation – This fills existing forms, doesn’t create new ones
  • OCR/text extraction – Use Datalab’s OCR endpoints instead
  • Non-form documents – Regular PDFs without fillable fields or clear form structure

Quick Start

Use this in a script file (.py). In a notebook or REPL, __file__ is undefined—use explicit paths for the form and output instead.

import os
from pathlib import Path
from dotenv import load_dotenv
from datalab_sdk import DatalabClient, FormFillingOptions

# In a .py file: script_dir = Path(__file__).parent. In notebook/REPL: script_dir = Path(".")
script_dir = Path(__file__).parent
load_dotenv(script_dir / ".env")

client = DatalabClient(api_key=os.getenv("DATALAB_API_KEY"))

options = FormFillingOptions(
    field_data={
        "full_name": {"value": "John Doe", "description": "Full legal name"},
        "date_of_birth": {"value": "1990-01-15", "description": "Date of birth"},
    },
    context="Employment application form",
    confidence_threshold=0.5,
)

form_path = script_dir / "form.pdf"
result = client.fill(str(form_path), options=options)
result.save_output(str(script_dir / "filled_form.pdf"))

print(f"Filled: {result.fields_filled}")
print(f"Not found: {result.fields_not_found}")

Using the Fill Form Script

For quick command-line filling, use the bundled script. Run from the skill directory or use the full path:

# From skill directory (form.pdf and field_data.json in current dir)
python scripts/fill_form.py form.pdf field_data.json -o filled.pdf

# From another directory: use full paths for script, form, and field data
python /path/to/form-filling/scripts/fill_form.py /path/to/form.pdf /path/to/field_data.json -o filled.pdf

Options: -o output.pdf, -c "context string", -t 0.7 (threshold), -p "0-2" (pages 1-3, 0-indexed), --async

See scripts/sample_field_data.json for a template. The field_data.json format:

{
  "name": { "value": "Jane Smith", "description": "Full name" },
  "ssn": { "value": "123-45-6789", "description": "Social Security Number" }
}

Key Guidance

Field Data Design

  • Always include description for each field to improve matching accuracy
  • Use context to describe the form type (e.g., “IRS W-4 Employee’s Withholding Certificate”)
  • Field values are always strings, even for numbers and dates

Supported Field Types

Text, date, numeric, checkbox ("Yes"/"No"), and signature (rendered as text).

Handling Unmatched Fields

If result.fields_not_found is non-empty:

  1. Improve field descriptions to better match the form’s labels
  2. Add or refine the context parameter
  3. Lower confidence_threshold to catch more matches

URL Source

result = client.fill(file_url="https://example.com/form.pdf", options=options)

Image Forms (Scanned PDFs, PNG, JPG)

The SDK handles image-based forms automatically:

# Scanned form or image file
result = client.fill("scanned_form.png", options=options)
result.save_output("filled_form.png")  # Output matches input format

Async Processing

For batch operations or non-blocking calls. Paths are relative to the current working directory.

from datalab_sdk import AsyncDatalabClient, FormFillingOptions

async with AsyncDatalabClient(api_key=os.getenv("DATALAB_API_KEY")) as client:
    result = await client.fill("form.pdf", options=options)
    result.save_output("filled.pdf")

Common Pitfalls

API Key Not Found

Problem: DatalabAPIError: You must pass in an api_key or set DATALAB_API_KEY

Solution: The .env file isn’t auto-loaded. Always:

  1. Use load_dotenv() with explicit path: load_dotenv(Path(__file__).parent / ".env")
  2. Pass API key explicitly: DatalabClient(api_key=os.getenv("DATALAB_API_KEY"))

File Not Found When Running Script

Problem: Relative paths like "form.pdf" fail when script runs from a different directory.

Solution: Use absolute paths based on script location:

script_dir = Path(__file__).parent
form_path = script_dir / "form.pdf"
result = client.fill(str(form_path), options=options)

Module Not Found

Problem: ModuleNotFoundError: No module named 'datalab_sdk'

Solution: Install the SDK first:

pip install datalab-python-sdk python-dotenv

References

  • Full API details: See references/api-reference.md for installation/prerequisites, FormFillingOptions, confidence threshold tuning, image form handling, batch async patterns, result fields, error handling, and client configuration