test-debug-failures
npx skills add https://github.com/dawiddutoit/custom-claude --skill test-debug-failures
Agent 安装分布
Skill 文档
Debug Test Failures
When to Use This Skill
Use this skill when users mention:
- “tests are failing”
- “pytest errors”
- “test suite not passing”
- “debug test failures”
- “fix broken tests”
- “tests failing after changes”
- “mock not called”
- “assertion error in tests”
- Any test-related debugging task
What This Skill Does
Provides systematic evidence-based test debugging through 6 mandatory phases:
- Evidence Collection – Run tests with verbose output, capture actual errors
- Root Cause Analysis – Categorize problems (test setup vs business logic vs integration)
- Specific Issue Location – Identify exact file:line:column for ALL occurrences
- Systematic Fix – Plan and implement fixes for all related errors
- Validation – Re-run tests and verify no regressions
- Quality Gates – Run project-specific checks (type, lint, dead code)
Prevents assumption-driven fixes by enforcing proper diagnostic sequence.
Quick Start
When tests fail, use this skill to systematically identify and fix root causes:
# This skill will guide you through:
# 1. Running tests with verbose output
# 2. Parsing actual error messages
# 3. Identifying root cause (test setup vs business logic vs integration)
# 4. Locating specific issues (file:line:column)
# 5. Fixing ALL occurrences systematically
# 6. Validating the fix
# 7. Running quality gates
Table of Contents
Core Sections
- Purpose – Evidence-based systematic debugging philosophy
- Quick Start – Immediate workflow overview
- Mandatory Debugging Sequence – 6-phase systematic approach
- Phase 1: Evidence Collection – Run tests, capture output, parse errors
- Phase 2: Root Cause Analysis – Categorize problems, check test vs code
- Phase 3: Specific Issue Location – Exact locations, find all occurrences
- Phase 4: Systematic Fix – Plan, implement, address all related errors
- Phase 5: Validation – Re-run tests, check results, full test suite
- Phase 6: Quality Gates – Project checks, verify all gates pass
- Anti-Patterns – Common mistakes to avoid
- Common Test Failure Patterns – Diagnostic reference
- Pattern 1: Mock Not Called – Execution path issues
- Pattern 2: Attribute Error on Mock – Mock configuration issues
- Pattern 3: Assertion Mismatch – Business logic errors
- Pattern 4: Import/Fixture Errors – Dependency issues
Framework-Specific Guides
- Framework-Specific Quick Reference – Test runner commands and flags
- Python (pytest) – Run tests, common flags, debugging options
- JavaScript/TypeScript (Jest/Vitest) – Jest flags, Vitest reporter options
- Go – Test commands, race detection, specific package testing
- Rust – Cargo test, output control, sequential execution
Advanced Topics
- Success Criteria – Task completion checklist
- Examples – Detailed walkthroughs (see examples.md)
- Related Documentation – Project CLAUDE.md, quality gates, testing strategy
- Philosophy – Evidence-based debugging principles
Instructions
YOU MUST follow this sequence. No shortcuts.
Phase 1: Evidence Collection
1.1 Run Tests First – See Actual Errors
DO NOT make any changes before running tests. Execute with maximum verbosity:
Python/pytest:
uv run pytest <failing_test_path> -v --tb=short
# Or for full stack traces:
uv run pytest <failing_test_path> -vv --tb=long
# Or for specific test function:
uv run pytest <file>::<test_function> -vv
JavaScript/TypeScript:
# Jest
npm test -- --verbose --no-coverage <test_path>
# Vitest
npm run test -- --reporter=verbose <test_path>
# Mocha
npm test -- --reporter spec <test_path>
Other Languages:
# Go
go test -v ./...
# Rust
cargo test -- --nocapture --test-threads=1
# Ruby (RSpec)
bundle exec rspec <spec_path> --format documentation
1.2 Capture Output
Save the COMPLETE output. Do not summarize. Do not assume.
1.3 Read The Error – Parse Actual Messages
Identify:
- Error type (AssertionError, AttributeError, ImportError, etc.)
- Error message (exact wording)
- Stack trace (which functions called which)
- Line numbers (where error originated)
CRITICAL: Is this a “mock not called” error or an actual assertion failure?
Phase 2: Root Cause Analysis
2.1 Categorize The Problem
Test Setup Issues:
- Mock configuration incorrect (
mock_X not called,mock has no attribute Y) - Fixture problems (missing, incorrectly scoped, teardown issues)
- Test data issues (invalid inputs, wrong test doubles)
- Import/dependency injection errors
Business Logic Bugs:
- Assertion failures on expected values
- Logic errors in implementation
- Missing validation
- Incorrect algorithm
Integration Issues:
- Database connection failures
- External service unavailable
- File system access problems
- Environment configuration missing
2.2 Check Test vs Code
Use Read tool to examine:
- The failing test file
- The code being tested
- Related fixtures/mocks/setup
CRITICAL QUESTION: Is the problem in how the test is written, or what the code does?
Phase 3: Specific Issue Location
3.1 Provide Exact Locations
For EVERY error, document:
- File: Absolute path to file
- Line: Exact line number
- Column: Character position (if available)
- Function/Method: Name of failing function
- Error Type: Specific exception/assertion
- Actual vs Expected: What was expected, what was received
Example:
File: /abs/path/to/test_service.py
Line: 45
Function: test_process_data
Error: AssertionError: mock_repository.save not called
Expected: save() called once with data={'key': 'value'}
Actual: save() never called
3.2 Identify ALL Occurrences
Use Grep to find all instances of the pattern:
# Find all similar test patterns
grep -r "pattern_causing_error" tests/
# Find all places where mocked function is used
grep -r "mock_function_name" tests/
DO NOT fix just the first occurrence. Fix ALL of them.
Phase 4: Systematic Fix
4.1 Plan The Fix
Before making changes, document:
- What needs to change
- Why this fixes the root cause
- How many files/locations affected
- Whether this is test code or business logic
4.2 Implement Fix
Use MultiEdit for multiple changes to same file:
# â
CORRECT - All edits to same file in one operation
MultiEdit("path/to/file.py", [
{"old_string": incorrect_mock_setup_1, "new_string": correct_mock_setup_1},
{"old_string": incorrect_mock_setup_2, "new_string": correct_mock_setup_2},
{"old_string": incorrect_assertion, "new_string": correct_assertion}
])
For changes across multiple files, use Edit for each file:
# Fix in test file
Edit("tests/test_service.py", old_string=..., new_string=...)
# Fix in implementation
Edit("src/service.py", old_string=..., new_string=...)
4.3 Address All Related Errors
If error appears in 10 places, fix all 10. No arbitrary limits.
If pattern appears across files:
- Use Grep to find all occurrences
- Document each location
- Fix each location
- Track completion
Phase 5: Validation
5.1 Re-run Tests
Run the EXACT same test command from Phase 1:
# Same command as before
uv run pytest <failing_test_path> -vv
5.2 Check Results
- â PASS: All tests green â Proceed to Phase 6
- â FAIL: New errors â Return to Phase 2 (different root cause)
- â ï¸ PARTIAL: Some pass, some fail â Incomplete fix, return to Phase 3
5.3 Run Full Test Suite
Ensure no regressions:
# Python
uv run pytest tests/ -v
# JavaScript
npm test
# Go
go test ./...
Phase 6: Quality Gates
6.1 Run Project-Specific Quality Checks
For this project (project-watch-mcp):
./scripts/check_all.sh
Generic quality gates:
# Type checking
pyright # or tsc, or mypy
# Linting
ruff check src/ # or eslint, or rubocop
# Dead code detection
vulture src/ # or knip (JS)
# Security
bandit src/ # or npm audit
6.2 Verify All Gates Pass
Task is NOT done if quality gates fail. Fix or request guidance.
Usage Examples
Example 1: Mock Configuration Issue
# User says: "Tests failing in test_service.py"
# Phase 1: Run tests
uv run pytest tests/test_service.py -vv
# Error: AssertionError: Expected 'save' to be called once. Called 0 times.
# Phase 2: Root cause - Mock not called (execution path issue)
# Phase 3: Location - test_service.py:45, function test_process_data
# Phase 4: Fix mock configuration to match actual code path
# Phase 5: Re-run tests - PASS
# Phase 6: Run quality gates - PASS
Example 2: Business Logic Assertion
# User says: "test_calculate is failing"
# Phase 1: Run tests
uv run pytest tests/test_calculator.py::test_calculate -vv
# Error: AssertionError: assert 42 == 43
# Phase 2: Root cause - Business logic error (calculation off by 1)
# Phase 3: Location - src/calculator.py:23, function calculate
# Phase 4: Fix algorithm in implementation
# Phase 5: Re-run tests - PASS
# Phase 6: Run quality gates - PASS
Example 3: Integration Test Failure
# User says: "Integration tests are failing"
# Phase 1: Run tests
uv run pytest tests/integration/ -vv
# Error: Neo4j connection failed
# Phase 2: Root cause - Integration issue (database unavailable)
# Phase 3: Location - Environment configuration
# Phase 4: Start Neo4j Desktop
# Phase 5: Re-run tests - PASS
# Phase 6: Run quality gates - PASS
Validation Process
This skill enforces systematic validation at each phase:
- Evidence Validation – Actual test output captured, not assumed errors
- Root Cause Validation – Problem categorized correctly (setup vs logic vs integration)
- Location Validation – Exact file:line:column identified for ALL occurrences
- Fix Validation – Re-run tests to verify fix works
- Regression Validation – Full test suite passes, no new failures
- Quality Validation – All quality gates pass (type check, lint, dead code)
Expected Outcomes
Successful Debugging
â Tests Fixed
Failing tests: tests/test_service.py::test_process_data
Root cause: Mock configuration - save() not called due to early return
Fix applied: Updated mock to handle validation failure path
Location: tests/test_service.py:45-52
Test results:
â Specific test: PASS
â Full suite: PASS (no regressions)
â Quality gates: PASS
Time: 8 minutes (systematic debugging)
Confidence: High (evidence-based fix)
Debugging Failure (Needs More Investigation)
â ï¸ Additional Investigation Required
Failing tests: tests/test_integration.py::test_database
Root cause: Partially identified - connection issue
Current status: Neo4j started, but connection still failing
Next steps:
1. Check Neo4j logs for startup errors
2. Verify connection configuration in settings
3. Test connection manually with cypher-shell
4. Check firewall/network settings
Blocker: Database configuration needs review
Integration Points
With Project CLAUDE.md
Implements CLAUDE.md debugging approach:
- “Run tests first” – Enforced in Phase 1
- “Read the error” – Phase 1.3 parses actual messages
- “Be specific” – Phase 3 requires file:line:column
- “Be complete” – Phase 4 fixes ALL occurrences
With Quality Gates
Integrates with quality validation:
- Phase 6 runs project check_all.sh
- Ensures type checking passes (pyright)
- Validates linting (ruff)
- Checks for dead code (vulture)
With Other Skills
Coordinates with complementary skills:
run-quality-gates– Automated Phase 6 executionanalyze-logs– Investigate test failure logsdebug-type-errors– Resolve type checking failuressetup-async-testing– Fix async test issues
Expected Benefits
| Metric | Without Skill | With Skill | Improvement |
|---|---|---|---|
| Debug Time | 30-60 min (trial & error) | 8-15 min (systematic) | 4-7x faster |
| Fix Accuracy | ~60% (assumptions) | ~95% (evidence-based) | 58% improvement |
| Regressions | ~20% (incomplete fixes) | <2% (full validation) | 90% reduction |
| Complete Fixes | ~40% (first occurrence only) | ~98% (all occurrences) | 145% improvement |
| Quality Gate Pass | ~70% (skipped) | 100% (enforced) | 43% improvement |
Success Metrics
After using this skill:
- 100% evidence-based – No assumptions, only actual test output
- 95% fix accuracy – Root cause identified correctly
- 98% complete fixes – All occurrences addressed
- <2% regressions – Full suite validation prevents new failures
- 100% quality gate compliance – All checks pass before “done”
Red Flags to Avoid
â WRONG: Making changes before running tests
"The mock setup looks wrong, let me fix it..."
â RIGHT: Run test first, see actual error
"Let me run the test to see the exact error message..."
â WRONG: Assuming errors are unrelated
"This ImportError is probably unrelated to the test failure..."
â RIGHT: Investigate every error
"Let me trace why this import is failing - it may be the root cause..."
â WRONG: Fixing first occurrence only
"Fixed the mock in test_create.py, done!"
â RIGHT: Search for all occurrences
"Let me search for this pattern across all test files..."
â WRONG: Skipping quality gates
"Tests pass now, we're done!"
â RIGHT: Run full validation
"Tests pass. Running quality gates to check for regressions..."
â WRONG: Quick fix without understanding
"Let me just add a try/except to suppress this error..."
â RIGHT: Understand root cause
"This error indicates a deeper issue. Let me investigate why..."
Common Test Failure Patterns
Pattern 1: Mock Not Called
Error:
AssertionError: Expected 'save' to be called once. Called 0 times.
Root Cause:
- Code path not executing
- Early return before mock called
- Exception preventing execution
- Wrong mock target
Investigation:
- Read the test – what’s being tested?
- Read the implementation – is save() actually called?
- Check for early returns or exceptions
- Verify mock is patching correct location
Pattern 2: Attribute Error on Mock
Error:
AttributeError: Mock object has no attribute 'success'
Root Cause:
- Mock returns dict when ServiceResult expected
- Incomplete mock configuration
- Missing return_value or side_effect
- Wrong mock type
Investigation:
- Check what test expects mock to return
- Check what code actually returns
- Verify mock configuration matches interface
- Look for type mismatches (dict vs object)
Pattern 3: Assertion Mismatch
Error:
AssertionError: assert 'foo' == 'bar'
Root Cause:
- Business logic error
- Test expectation wrong
- Data transformation issue
- Configuration problem
Investigation:
- Is expected value correct?
- Is actual value correct?
- Is transformation logic correct?
- Are inputs to function correct?
Pattern 4: Import/Fixture Errors
Error:
ImportError: cannot import name 'X' from 'module'
fixture 'db_session' not found
Root Cause:
- Missing dependency
- Fixture not in scope
- Circular import
- Module not installed
Investigation:
- Check imports at top of file
- Check fixture definition location
- Check conftest.py for fixtures
- Verify dependencies installed
Troubleshooting
See Troubleshooting section below for common issues.
Task Completion Checklist
Before declaring task complete:
- All tests pass (specific failing test)
- Full test suite passes (no regressions)
- Root cause identified and documented
- All related errors fixed (not just first occurrence)
- Quality gates pass
- No new errors introduced
- Fix addresses root cause, not symptoms
Framework-Specific Quick Reference
Python (pytest)
Run tests:
uv run pytest tests/ -v # All tests
uv run pytest tests/test_file.py -vv # Specific file
uv run pytest tests/test_file.py::test_fn # Specific test
uv run pytest -k "pattern" -v # Pattern matching
uv run pytest --lf -v # Last failed
Common flags:
-v– Verbose-vv– Extra verbose-s– Show print statements--tb=short– Shorter traceback--tb=long– Full traceback--pdb– Drop into debugger on failure
JavaScript/TypeScript (Jest/Vitest)
Run tests:
npm test # All tests
npm test -- <test_file> # Specific file
npm test -- --testNamePattern="pattern" # Pattern matching
npm test -- --verbose # Verbose output
Jest flags:
--verbose– Detailed output--no-coverage– Skip coverage--detectOpenHandles– Find async issues--forceExit– Force exit after tests
Vitest flags:
--reporter=verbose– Detailed output--run– Run once (no watch)--coverage– Generate coverage
Go
Run tests:
go test ./... # All packages
go test -v ./pkg/service # Specific package
go test -run TestFunctionName # Specific test
go test -v -race ./... # Race detection
Rust
Run tests:
cargo test # All tests
cargo test test_function_name # Specific test
cargo test -- --nocapture # Show output
cargo test -- --test-threads=1 # Sequential
Examples
See examples.md for detailed walkthroughs of:
- Mock configuration debugging
- Business logic assertion failures
- Integration test issues
- Fixture scope problems
- Cross-framework patterns
Shell Scripts
- Test Skill Script – Helper script for testing and running test debug workflows
Requirements
Tools needed:
- Bash (for running tests)
- Read (for examining test and source files)
- Grep (for finding all occurrences)
- Glob (for pattern matching)
- Edit/MultiEdit (for fixing code)
Test Frameworks Supported:
- Python: pytest, unittest
- JavaScript/TypeScript: Jest, Vitest, Mocha
- Go: go test
- Rust: cargo test
- Ruby: RSpec
Project-Specific:
- Access to test suite (tests/ directory)
- Access to source code under test
- Ability to run quality gates (./scripts/check_all.sh or equivalent)
Knowledge:
- Basic understanding of test framework syntax
- Ability to read stack traces
- Understanding of mock/fixture patterns
Related Documentation
- Project CLAUDE.md: Critical workflow rules and debugging approach
- Quality Gates: ../quality-run-quality-gates/references/shared-quality-gates.md
- Testing Strategy: Project-specific testing documentation
Philosophy
This skill enforces evidence-based debugging:
- See actual errors (not assumed errors)
- Understand root causes (not symptoms)
- Fix systematically (not randomly)
- Validate thoroughly (not superficially)
The goal is not speed. The goal is correctness.
Slow down. Read the errors. Understand the problem. Fix it properly.