quality-capture-baseline
npx skills add https://github.com/dawiddutoit/custom-claude --skill quality-capture-baseline
Agent 安装分布
Skill 文档
Capture Quality Baseline
Table of Contents
Core Sections
- Purpose – Baseline establishment for regression detection
- Quick Start – When to invoke and basic usage
- Instructions – Complete implementation guide
- Step 1: Prepare Context – Feature name, git commit, timestamp
- Step 2: Run Quality Checks – Primary and fallback methods
- Step 3: Parse Metrics – Extract tests, coverage, type errors, linting, dead code
- Step 4: Create Memory Entity – Store baseline in memory system
- Step 5: Return Baseline Reference – Success output format
- Agent Integration – @planner, @implementer, @statuser workflows
- Edge Cases – Quality check failures, no git, missing scripts, memory unavailable
- Anti-Patterns – What to avoid and what to do instead
- Integration with Other Skills – run-quality-gates, validate-refactor-adr, manage-refactor-markers, manage-todo
- Examples – Comprehensive scenarios reference
- Reference – Parsing, lifecycle, quality gates documentation
- Success Criteria – Validation checklist
- Troubleshooting – Common problems and solutions
Purpose
Establish a quality metrics baseline at the start of a feature or refactor so that subsequent quality gate runs can detect regressions. This skill is a critical component of the project’s regression detection strategy.
When to Use
Use this skill when:
- Starting any new feature (before writing code)
- Beginning refactor work (to document pre-state)
- After major changes (dependency upgrades, architecture shifts)
- When @planner creates a new todo.md
- When @implementer starts task 0 of a feature
- User asks to “capture baseline” or “establish baseline metrics”
- Before major architectural changes to track impact
Quick Start
When to invoke this skill:
- â At the START of any new feature (before writing code)
- â Before beginning refactor work (to document pre-state)
- â After major changes (dependency upgrades, architecture shifts)
- â NOT after you’ve already started coding (too late)
Basic usage:
1. Run quality checks: ./scripts/check_all.sh
2. Parse metrics (tests, coverage, type errors, linting, dead code)
3. Create memory entity: baseline_{feature}_{date}
4. Return baseline reference for future comparison
Instructions
Step 1: Prepare Context
Before running quality checks, gather:
- Feature name or refactor identifier (e.g., “auth”, “service-result-migration”)
- Current git commit hash:
git rev-parse HEAD - Current timestamp:
date +"%Y-%m-%d %H:%M:%S"
Example:
FEATURE="user_profile"
GIT_COMMIT=$(git rev-parse HEAD)
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
DATE=$(date +"%Y-%m-%d")
Step 2: Run Quality Checks
Primary method (recommended):
./scripts/check_all.sh
Fallback (if check_all.sh fails):
# Run individual checks
uv run pytest tests/ -v --cov=src --cov-report=term-missing
uv run pyright src/
uv run ruff check src/
uv run vulture src/
Capture output:
- Store stdout and stderr
- Measure execution time
- Note any failures (document them, don’t skip)
Step 3: Parse Metrics
Use the parsing patterns from references/parsing.md. Quick reference:
Tests:
# Pattern: "X passed" or "X failed" or "X skipped"
PASSED=$(grep -oE "[0-9]+ passed" output.txt | grep -oE "[0-9]+")
FAILED=$(grep -oE "[0-9]+ failed" output.txt | grep -oE "[0-9]+")
SKIPPED=$(grep -oE "[0-9]+ skipped" output.txt | grep -oE "[0-9]+")
Coverage:
# Pattern: "TOTAL ... X%"
COVERAGE=$(grep "TOTAL" output.txt | grep -oE "[0-9]+%" | tail -1)
Type Errors:
# Pattern: "X errors, Y warnings" or "0 errors"
TYPE_ERRORS=$(grep -oE "[0-9]+ error" output.txt | grep -oE "[0-9]+" | head -1)
Linting:
# Pattern: "Found X errors"
LINTING_ERRORS=$(grep -oE "Found [0-9]+ error" output.txt | grep -oE "[0-9]+")
Dead Code:
# Pattern: Count vulture output lines or "X% unused"
DEAD_CODE=$(vulture src/ 2>&1 | grep -v "^$" | wc -l | xargs)
Execution Time:
# Measure with time command or calculate from timestamps
EXEC_TIME=$(echo "scale=1; $END_TIME - $START_TIME" | bc)
Step 4: Create Memory Entity
Entity structure:
name: quality-capture-baseline
type: quality_baseline
observations:
- "Tests: {passed} passed, {failed} failed, {skipped} skipped"
- "Coverage: {coverage}%"
- "Type errors: {type_errors}"
- "Linting errors: {linting_errors}"
- "Dead code: {dead_code} items"
- "Execution time: {exec_time}s"
- "Git commit: {git_hash}"
- "Timestamp: {timestamp}"
- "Feature: {feature_name}"
Example:
mcp__memory__create_entities({
"entities": [{
"name": f"baseline_{feature}_{date}",
"type": "quality_baseline",
"observations": [
f"Tests: {passed} passed, {failed} failed, {skipped} skipped",
f"Coverage: {coverage}%",
f"Type errors: {type_errors}",
f"Linting errors: {linting_errors}",
f"Dead code: {dead_code} items",
f"Execution time: {exec_time}s",
f"Git commit: {git_commit}",
f"Timestamp: {timestamp}",
f"Feature: {feature}"
]
}]
})
If pre-existing issues exist: Add additional observation:
"Pre-existing issues: 3 type errors in legacy code (documented, will not be fixed)"
Step 5: Return Baseline Reference
Success output format:
â
Baseline captured: baseline_{feature}_{date}
Metrics:
- Tests: {passed} passed ({coverage}% coverage)
- Type safety: {status} ({type_errors} errors)
- Code quality: {status} ({linting_errors} linting, {dead_code} dead code items)
- Execution: {exec_time}s
- Git: {git_hash}
Use this baseline for regression detection.
With pre-existing issues:
â
Baseline captured: baseline_{feature}_{date}
Metrics:
- Tests: 152 passed (89% coverage)
- Type safety: 3 errors (pre-existing, documented)
- Code quality: Clean
- Git: abc123f
â ï¸ Note: 3 pre-existing type errors documented, will not cause regression failures.
Use this baseline for regression detection.
Agent Integration
@planner Usage
When: At plan creation, BEFORE creating todo.md
Workflow:
- Receive feature request from user
- Analyze requirements
- â Invoke capture-quality-baseline
- Receive baseline reference
- Create todo.md with baseline in header
- Create memory entities linking to baseline
Example:
# Todo: User Profile Feature
Baseline: baseline_user_profile_2025-10-16
Created: 2025-10-16
## Tasks
- [ ] Task 1 (references baseline for quality gates)
...
@implementer Usage
When: At feature start (task 0) or before refactor work
Workflow:
- Validate ADR (if refactor) using validate-refactor-adr
- â Invoke capture-quality-baseline
- Receive baseline reference
- Reference baseline in quality gate runs
- Compare final metrics against baseline
Example:
User: "Start implementing authentication"
@implementer:
1. Invoke capture-quality-baseline
2. Receives: baseline_auth_2025-10-16
3. Begin task 1 implementation
4. After each task: Run quality gates, compare to baseline
5. Report: "All metrics maintained or improved from baseline"
@statuser Usage
When: User requests baseline recapture or progress check
Workflow:
- Receive progress check request
- If major changes detected: â Invoke capture-quality-baseline
- Compare new baseline to original
- Report delta and recommend re-baseline if appropriate
Example:
User: "Check progress after dependency upgrade"
@statuser:
1. Detects major change (dependency upgrade)
2. Invokes capture-quality-baseline
3. Receives: baseline_post_upgrade_2025-10-16
4. Compares to original baseline
5. Reports: "2 tests removed (deprecated APIs), coverage maintained, type errors resolved"
Edge Cases
Quality Checks Fail
Scenario: Some checks fail during baseline capture
Handling:
- Still capture the baseline (document failures)
- Add observation: “Baseline captured WITH failures”
- Flag in output: “â ï¸ Baseline has failures – must fix before feature work”
- Return baseline reference (still usable for comparison)
Example:
â
Baseline captured: baseline_auth_2025-10-16
â ï¸ WARNING: Baseline has failures
Metrics:
- Tests: 140 passed, 5 FAILED, 2 skipped
- Type safety: 2 errors
- Code quality: 3 linting errors
ð FIX these failures before starting feature work.
No Git Repository
Scenario: Not in a git repository (or git not available)
Handling:
- Skip git commit capture
- Continue with other metrics
- Add observation: “No git commit (not in repository)”
- Note in output: “Git: Not available”
Script Not Found
Scenario: ./scripts/check_all.sh doesn’t exist
Handling:
- Try individual checks (pytest, pyright, ruff, vulture)
- If any work: Use those results
- If none work: Report error and abort
Error message:
â Cannot capture baseline: Quality check scripts not found
Tried:
- ./scripts/check_all.sh (not found)
- uv run pytest (not found)
- uv run pyright (not found)
Cannot establish baseline without quality checks.
Memory Service Unavailable
Scenario: Cannot create memory entity (MCP server down)
Handling:
- Store baseline in local file:
.quality-baseline-{feature}-{date}.json - Return file path as reference
- Note in output: “Baseline stored locally (memory unavailable)”
Fallback file format:
{
"baseline_name": "baseline_auth_2025-10-16",
"metrics": {
"tests": {"passed": 145, "failed": 0, "skipped": 2},
"coverage": 87,
"type_errors": 0,
"linting_errors": 0,
"dead_code": 3,
"execution_time": 8.3,
"git_commit": "abc123f",
"timestamp": "2025-10-16 10:30:00"
}
}
Anti-Patterns
â DON’T:
- Skip baseline capture (regression detection requires it)
- Reuse old baselines for new features (each feature needs its own)
- Capture baseline after starting work (too late)
- Ignore failing checks in baseline (document them)
- Forget to reference baseline in todo.md
â DO:
- Capture baseline BEFORE any feature work
- Create unique baseline per feature
- Document pre-existing issues in baseline
- Reference baseline in todo.md and memory
- Re-baseline after major changes (upgrades, migrations)
Integration with Other Skills
Skill: run-quality-gates
- capture-quality-baseline: Establishes the reference point
- run-quality-gates: Compares current metrics against baseline
Skill: validate-refactor-adr
- validate-refactor-adr: Validates ADR completeness
- capture-quality-baseline: Documents pre-refactor state
Skill: manage-refactor-markers
- capture-quality-baseline: Establishes pre-refactor metrics
- manage-refactor-markers: Tracks refactor progress
Skill: manage-todo
- capture-quality-baseline: Provides baseline reference
- manage-todo: Stores baseline name in todo.md header
Examples
See examples.md for comprehensive scenarios:
- Feature start baseline (clean state)
- Refactor start baseline (with pre-existing issues)
- Re-baseline after major change
- Baseline with failures
- Agent auto-invocation patterns
- Cross-agent baseline sharing
- Baseline comparison workflows
- Recovery from failed baseline capture
Reference
- Metrics parsing: See
references/parsing.mdfor detailed parsing logic - Baseline lifecycle: See
references/lifecycle.mdfor management best practices - Quality gates: See project’s
../run-quality-gates/references/shared-quality-gates.mdfor gate definitions
Success Criteria
- Quality checks execute successfully (or failures documented)
- All 5 metrics extracted (tests, coverage, type errors, linting, dead code)
- Memory entity created with baseline data
- Baseline name returned for reference
- Execution time < 30 seconds (typically ~8s)
- Git commit captured (if available)
- Timestamp recorded
- Pre-existing issues documented (if any)
Troubleshooting
Problem: Metrics parsing fails (empty values)
Solution:
- Check quality check output format (may have changed)
- Review parsing patterns in
references/parsing.md - Update regex patterns if needed
- Fallback: Manually parse critical metrics
Problem: Baseline creation succeeds but can’t be found later
Solution:
- Verify memory entity name format:
baseline_{feature}_{date} - Search memory:
mcp__memory__search_memories(query="baseline") - Check fallback file:
.quality-baseline-*.json
Problem: Quality checks take too long (> 30s)
Solution:
- Check if Neo4j is running (connection timeout)
- Skip slow tests:
pytest -m "not slow" - Run checks in parallel (check_all.sh does this)
- Report timing issue to user
Last Updated: 2025-10-16 Priority: High (critical for regression detection) Dependencies: Quality gate scripts, memory MCP server