quality-detect-regressions

📁 dawiddutoit/custom-claude 📅 Jan 26, 2026

总安装量

周安装量

#54262

全站排名

安装命令

npx skills add https://github.com/dawiddutoit/custom-claude --skill quality-detect-regressions

Agent 安装分布

mcpjam 4

neovate 4

gemini-cli 4

antigravity 4

windsurf 4

zencoder 4

Skill 文档

detect-quality-regressions

Purpose

Compare current quality metrics against a stored baseline to detect regressions in tests, coverage, type errors, linting, and dead code. Enforces quality standards by blocking task completion when metrics degrade beyond tolerance thresholds.

When to Use

MANDATORY invocation scenarios:

After completing a task (before marking complete)
Before creating a commit or pull request
Before merging to main branch
When user asks “is this done?” or “check quality”

User trigger phrases:

“detect regressions”
“check against baseline”
“validate quality hasn’t degraded”
“compare to baseline”

Quick Start

Basic usage after completing a task:

1. Complete your code changes
2. Run tests manually to verify they pass
3. Invoke this skill: "Detect regressions against baseline_feature_2025-10-16"
4. If PASS: Mark task complete
5. If FAIL: Fix regressions, re-run detection

This skill automatically:

Loads baseline from memory
Runs ./scripts/check_all.sh
Compares 5 metrics (tests, coverage, type errors, linting, dead code)
Detects regressions with tolerance rules
Returns PASS/FAIL with actionable delta report

Core Sections

Instructions – Complete workflow for regression detection
- Step 1: Load Baseline from Memory – Retrieve and validate baseline metrics
- Step 2: Run Current Quality Checks – Execute check_all.sh and capture metrics
- Step 3: Compare Metrics (Regression Detection) – Apply comparison rules
- Step 4: Generate Delta Report – Create metric comparison report
- Step 5: Return Result – PASS/FAIL decision logic
When to Invoke – Triggering conditions (after tasks, before commits, status checks)
Examples – Real-world scenarios
- Example 1: No Regressions (PASS) – Successful validation scenario
- Example 2: Regression Detected (FAIL) – Handling quality degradation
Edge Cases – Special situations (missing baseline, check failures, pre-existing issues)

Advanced Topics

Integration Points – Coordination with other skills and agents
Anti-Patterns to Avoid – Common mistakes and correct approaches
Success Criteria – Validation checklist
Supporting Files – References and examples
Requirements – Environment, memory schema, tools

Instructions

Step 1: Load Baseline from Memory

Query memory for baseline:

Use mcp__memory__find_memories_by_name to retrieve the baseline:

baseline_names = ["baseline_<feature>_<date>"]
# Example: ["baseline_auth_2025-10-16"]

Validate baseline exists:

If not found: Return â ï¸ WARNING - No baseline found, suggest capturing baseline first
If found: Parse baseline metrics

Extract baseline metrics:

Parse the baseline entity’s observations to extract:

Tests: X passed, Y failed, Z skipped
Coverage: X%
Type errors: X errors
Linting errors: X errors
Dead code: X%

Example baseline observations:

- Tests: 145 passed, 0 failed, 3 skipped
- Coverage: 87%
- Type errors: 0
- Linting errors: 0
- Dead code: 1.2%

Step 2: Run Current Quality Checks

Execute quality checks:

cd /Users/dawiddutoit/projects/play/project-watch-mcp
./scripts/check_all.sh

Capture output:

Save stdout and stderr
Parse same 5 metrics as baseline
Handle script failures (return FAIL if checks can’t run)

Parse current metrics:

Extract from check_all.sh output:

Tests: Look for “X passed” in pytest output
Coverage: Look for “TOTAL” line with percentage
Type errors: Count errors in pyright output
Linting errors: Count violations in ruff output
Dead code: Parse vulture output for percentage

Step 3: Compare Metrics (Regression Detection)

Apply comparison rules:

Metric	Rule	Tolerance	Regression If
Tests passed	Must be >= baseline	None	current < baseline
Coverage	Must be >= baseline – 1%	1%	current < baseline – 1%
Type errors	Must be <= baseline	None	current > baseline
Linting	Must be <= baseline	None	current > baseline
Dead code	Must be <= baseline + 2%	2%	current > baseline + 2%

For each metric:

Calculate change: current - baseline
Check if regression: Apply rule from table
Mark status: improved, stable, or regressed
Calculate severity: critical, high, medium, low

Regression severity:

Critical: Type errors increased (breaks type safety)
High: Tests decreased or coverage dropped >2%
Medium: Linting errors increased
Low: Dead code increased slightly (within tolerance)

Step 4: Generate Delta Report

Create comparison for each metric:

metric_name:
  baseline: <value>
  current: <value>
  change: <+/- difference>
  status: improved | stable | regressed
  severity: critical | high | medium | low (if regressed)

Identify regressions:

Filter metrics where status == regressed and create regression list:

regressions:
  - metric: tests
    baseline: 152
    current: 150
    change: -2
    severity: high
    action: "Investigate test_user_service.py, test_auth_service.py"

Identify improvements:

Filter metrics where status == improved for positive feedback.

Step 5: Return Result

Decision logic:

IF any metric has status == regressed:
  RETURN FAIL with regression list
ELSE:
  RETURN PASS with improvements

PASS result format:

â PASS - No regressions detected

Delta Report:
- Tests: +3 passed (148 total) ð
- Coverage: +2% (89% total) ð
- Type errors: No change (0) â
- Linting: No change (0) â
- Dead code: -0.1% (1.1% total) ð

All metrics maintained or improved. Safe to mark task complete.

FAIL result format:

ð´ FAIL - 4 regressions detected

Regressions:
1. Tests: -2 passed (150 vs 152)
   â 2 tests removed or now failing
   â Action: Investigate test_user_service.py, test_auth_service.py

2. Coverage: -4% (85% vs 89%)
   â Coverage dropped below tolerance (88%)
   â Action: Add tests for newly refactored code

3. Type errors: +2 new errors (5 vs 3)
   â New type errors introduced
   â Action: Run pyright --verbose, fix errors

4. Linting: +2 new errors
   â New linting violations
   â Action: Run ruff check --fix, review changes

â BLOCKED - Do not mark task complete until regressions fixed.

Fix order:
1. Fix linting (ruff check --fix)
2. Fix type errors (pyright)
3. Re-run tests (investigate failures)
4. Add coverage for new code
5. Re-run regression detection

When to Invoke

After completing a task (@implementer, @unit-tester, @integration-tester):

Code changes complete
Tests written and pass locally
â Invoke detect-quality-regressions before marking task complete
If PASS: Mark complete, move to next task
If FAIL: Fix regressions, re-run detection

Before committing/merging:

User requests commit
â Invoke detect-quality-regressions to validate
If PASS: Proceed with commit
If FAIL: Block commit, report regressions

When checking status (@statuser):

User asks “What’s the status?”
â Invoke detect-quality-regressions to get current quality state
Report status with delta

Examples

Example 1: No Regressions (PASS)

Context: Completed Task 2.1 (add user authentication)

Execution:

1. Load baseline: baseline_auth_2025-10-16
   â Tests: 145 passed, 0 failed
   â Coverage: 87%
   â Type errors: 0
   â Linting: 0
   â Dead code: 1.2%

2. Run checks: ./scripts/check_all.sh
   â Tests: 148 passed (+3), 0 failed
   â Coverage: 89% (+2%)
   â Type errors: 0 (no change)
   â Linting: 0 (no change)
   â Dead code: 1.1% (-0.1%)

3. Compare:
   â Tests: 148 >= 145 (PASS)
   â Coverage: 89% >= 86% (87% - 1%) (PASS)
   â Type errors: 0 <= 0 (PASS)
   â Linting: 0 <= 0 (PASS)
   â Dead code: 1.1% <= 3.2% (1.2% + 2%) (PASS)

4. Result: â PASS

Example 2: Regression Detected (FAIL)

Context: Completed Task 3.2 (refactor service layer)

Execution:

1. Load baseline: baseline_service_result_2025-10-16
   â Tests: 152 passed
   â Coverage: 89%
   â Type errors: 3
   â Linting: 0

2. Run checks:
   â Tests: 150 passed (-2) â
   â Coverage: 85% (-4%) â
   â Type errors: 5 (+2) â
   â Linting: 2 (+2) â

3. Compare:
   â Tests: 150 < 152 (REGRESSION)
   â Coverage: 85% < 88% (89% - 1%) (REGRESSION)
   â Type errors: 5 > 3 (REGRESSION)
   â Linting: 2 > 0 (REGRESSION)

4. Result: ð´ FAIL - 4 regressions detected

See references/regression-fixes.md for detailed fix strategies

Edge Cases

1. Baseline Not Found

Search memory, no baseline exists
Return: â ï¸ WARNING – No baseline found
Action: Suggest running capture-quality-baseline first

2. Quality Checks Fail to Run

Script error, tool missing, etc.
Return: ð´ FAIL – Unable to validate quality
Block: Don’t allow work to proceed without validation

3. Pre-Existing Issues in Baseline

Baseline has 3 documented type errors
Current also has 3 type errors (same errors)
Result: â PASS (no NEW errors)
Note: “3 pre-existing errors maintained”

4. Tests Pass But Coverage Drops

All tests pass (no failures)
Coverage dropped 5% (regression)
Result: ð´ FAIL (coverage regression)
Action: Add tests for uncovered code

Integration Points

With capture-quality-baseline skill:

This skill loads the baseline that capture-quality-baseline created
Use same baseline naming convention: baseline_<feature>_<date>

With run-quality-gates skill:

run-quality-gates ensures quality checks pass (Definition of Done)
detect-quality-regressions compares against baseline (regression detection)
Both skills complement each other

With @implementer:

Primary integration – runs after every task
Blocks task completion if regressions detected

With @unit-tester / @integration-tester:

Runs after writing tests to validate quality didn’t degrade

With manage-todo skill:

Task state depends on regression detection result
Can’t mark complete with regressions

Anti-Patterns to Avoid

â DON’T: Skip regression detection (always run before task complete) â DON’T: Ignore regressions (“I’ll fix later”) â DON’T: Commit without running detection â DON’T: Mark task complete with regressions â DON’T: Use wrong baseline (old or different feature)

â DO: Run detection after every task â DO: Block on regressions (fail fast) â DO: Fix regressions immediately â DO: Use correct baseline for feature â DO: Document pre-existing issues

Success Criteria

â Baseline loaded successfully
â Quality checks execute
â All 5 metrics compared correctly
â Regressions detected accurately
â Clear PASS/FAIL result
â Actionable delta report

Supporting Files

references/comparison-rules.md – Detailed metric comparison rules and tolerances
references/regression-fixes.md – Fix strategies for each metric type

Requirements

Environment:

Project must be using ./scripts/check_all.sh
Memory baseline must exist (captured via capture-quality-baseline skill)
Quality tools must be installed: pyright, ruff, pytest, vulture

Memory Schema:

Entity type: “quality_baseline”
Entity name: “baseline__”
Observations: List of metric values (see Step 1)

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台