ml-model-evaluation

📁 kentoshimizu/sw-agent-skills 📅 1 day ago
1
总安装量
1
周安装量
#77932
全站排名
安装命令
npx skills add https://github.com/kentoshimizu/sw-agent-skills --skill ml-model-evaluation

Agent 安装分布

amp 1
cline 1
opencode 1
cursor 1
continue 1
kimi-cli 1

Skill 文档

Ml Model Evaluation

Overview

Use this skill to evaluate models with decision-grade evidence across aggregate and high-risk segments.

Scope Boundaries

  • Use this skill when the task matches the trigger condition described in description.
  • Do not use this skill when the primary task falls outside this skill’s domain.

Shared References

  • Threshold and segmentation rules:
    • references/threshold-and-segmentation-rules.md

Templates And Assets

  • Evaluation report template:
    • assets/evaluation-report-template.md

Inputs To Gather

  • Dataset splits and baseline/candidate definitions.
  • Business cost trade-offs for false positives/negatives.
  • Segment definitions for fairness/risk-critical cohorts.
  • Acceptance thresholds and calibration requirements.

Deliverables

  • Evaluation report with thresholds and decision.
  • Segment-level failure analysis.
  • Acceptance/rejection rationale and follow-ups.

Workflow

  1. Build evaluation report in assets/evaluation-report-template.md.
  2. Apply threshold/segment policy via references/threshold-and-segmentation-rules.md.
  3. Validate calibration and error concentration risks.
  4. Compare baseline vs candidate under same conditions.
  5. Publish release recommendation and unresolved risks.

Quality Standard

  • Thresholds are tied to business risk trade-offs.
  • Critical segments are explicitly evaluated.
  • Decision rationale is traceable to evidence.

Failure Conditions

  • Stop when evaluation omits high-risk segments.
  • Stop when acceptance thresholds are undefined.
  • Escalate when model risk is unacceptable for rollout.