ml-data-preprocessing

📁 kentoshimizu/sw-agent-skills 📅 1 day ago
1
总安装量
1
周安装量
#77431
全站排名
安装命令
npx skills add https://github.com/kentoshimizu/sw-agent-skills --skill ml-data-preprocessing

Agent 安装分布

amp 1
cline 1
opencode 1
cursor 1
continue 1
kimi-cli 1

Skill 文档

Ml Data Preprocessing

Overview

Use this skill to define preprocessing that improves model quality without introducing leakage or unreproducible transforms.

Scope Boundaries

  • Use this skill when the task matches the trigger condition described in description.
  • Do not use this skill when the primary task falls outside this skill’s domain.

Shared References

  • Leakage prevention rules:
    • references/leakage-prevention-rules.md

Templates And Assets

  • Preprocessing spec template:
    • assets/preprocessing-spec-template.md

Inputs To Gather

  • Source datasets, schema quality, and time boundaries.
  • Missing/outlier characteristics and domain constraints.
  • Train/validation/test split policy.
  • Reproducibility and compliance requirements.

Deliverables

  • Preprocessing specification with transformation rationale.
  • Leakage and data-quality validation plan.
  • Reproducibility notes and versioning requirements.

Workflow

  1. Draft transform plan with assets/preprocessing-spec-template.md.
  2. Validate temporal and label safety via references/leakage-prevention-rules.md.
  3. Define split-safe transformations and quality checks.
  4. Verify transform repeatability across runs.
  5. Publish preprocessing contract and residual risks.

Quality Standard

  • Transformations are deterministic and versioned.
  • Leakage risk is explicitly checked and mitigated.
  • Data loss/quality trade-offs are documented.

Failure Conditions

  • Stop when preprocessing introduces label or temporal leakage.
  • Stop when transforms are not reproducible.
  • Escalate when data quality blocks decision-grade training.