data-import-parsers

📁 janjaszczak/cursor 📅 6 days ago

总安装量

周安装量

#49901

全站排名

安装命令

npx skills add https://github.com/janjaszczak/cursor --skill data-import-parsers

Agent 安装分布

gemini-cli 3

github-copilot 3

codex 3

kimi-cli 3

cursor 3

amp 3

Use when working on:

Process input files sequentially (no âload everything then insertâ) to control memory.
Validate and coerce types explicitly (define allowed coercions; reject ambiguous cases).
Irreparable records must be skipped and logged to an error CSV containing:
- all original columns exactly as seen in input
- extra columns: timestamp, file, line, error
Emit metrics at minimum: rows_ok, rows_skipped, parse_errors.
DB writes must be idempotent: re-running the import must not duplicate or corrupt data.

Identify input formats, volume, and ârow identityâ rules (keys/dedup strategy).
Locate current import entrypoints + DB write layer.
Define:
- schema mapping (source -> target columns)
- validation rules per field
- coercion rules per field (what is allowed, what is not)
- error taxonomy (what counts as âirreparableâ)

Read one file at a time, row by row (or chunked) and write in bounded batches.
Never accumulate full datasets in memory.
Ensure progress logging is monotonic (e.g., file + row counters).

Treat raw row values as immutable âsource of truthâ.
Perform coercions in a controlled layer:
- return (ok, parsed_record) or (error, reason) per row
Separate concerns:
- parsing (raw -> typed)
- business validation (typed -> acceptable)
- persistence (acceptable -> DB)

On any irreparable row:
- write original row values (unmodified)
- add: timestamp, file, line, error
Avoid partial writes that drop context; every skipped row must be explainable from the CSV alone.

Maintain counters:

rows_ok: successfully persisted rows
rows_skipped: irreparable rows skipped
parse_errors: count of parse/validation errors (can equal rows_skipped or be a superset if you track recoverable warnings separately)

Emit metrics:

end-of-file summary
end-of-run summary (aggregate over files) Optionally persist metrics to a JSON or a DB table for observability.

Pick and implement one clear strategy:

Idempotency must hold across: