investigate-failing-tests
npx skills add https://github.com/klamping/webdriverio-skills --skill investigate-failing-tests
Agent 安装分布
Skill 文档
Investigate Failing Tests
Orchestrate diagnosis and repair of failing WebdriverIO tests.
This skill is a reasoning and routing layer. It does not write code or run tests directly. It delegates those actions to other skills/agents and manages decision points.
Delegation Contract
- Does not directly edit test files.
- Does not directly execute test commands.
- Delegates to:
running-webdriverio-testsfor executiongathering-contextfor artifact collectionanalyze-websitefor route/component structure context when missingwriting-webdriverio-codefor test-side fixesmanaging-project-customizationsfor stale/missing context refreshskipped-test-managerfor confirmed app bugs
When to Use
- A user reports failing WDIO tests and needs diagnosis + fix orchestration.
- Failure details are partial and need structured investigation.
- There is risk of repeated blind retries or trial-and-error edits.
When Not to Use
- You only need to run tests (
running-webdriverio-tests). - You already know the exact test fix and only need implementation (
writing-webdriverio-code). - You only need artifact/context gathering (
gathering-context).
Inputs
Extract and normalize from user input:
- failing spec file/path
- failing test name/title
- error message and stack (if provided)
- environment target (local/dev/staging/prod)
- any recent change context from the user
Target Resolution Rule
For references/website-analysis/<target>/..., resolve <target> from:
- explicit user URL, else
- project
baseUrlhost, else unknown-target
If error/stack is missing, delegate to running-webdriverio-tests to reproduce and collect failure details.
Workflow
- Parse user input and capture known failure facts.
- Load
.webdriverio-skills/project-context.*and.webdriverio-skills/custom-rules.mdif present. - Load
references/website-analysis/<target>/website-analysis.*when available to accelerate route/component reasoning. - If critical details are missing, delegate a focused test run to obtain error + stack.
- Delegate to
gathering-contextwith normalized failure details. - If route/component intent is unclear, delegate
analyze-websiteand merge results into hypothesis. - Analyze evidence + recent code changes to form a primary root-cause hypothesis.
- Confirm findings with the user before implementation (accept/adjust).
- If issue is test-side, delegate fix implementation to
writing-webdriverio-code. - Delegate test execution to validate.
- Repeat triage/fix loop with strict attempt limits.
Hypothesis and Confirmation
Before requesting a fix, provide the user:
- likely root cause (1 primary, 1 secondary)
- why evidence supports it
- intended fix direction
- expected validation command scope
Require explicit user confirmation or correction before implementation delegation.
Ralph Wiggum Loop (Bounded)
Use a bounded attempt loop for test-side fixes:
- attempt 1: implement best-supported fix
- attempt 2: adjust based on new failure evidence
- attempt 3: final focused adjustment
If still failing after 3 attempts:
- Stop iterative fixing.
- Refresh context via
managing-project-customizationsif stale/missing signals exist. - Ask the user for missing constraints, assumptions, or environment facts.
Never continue blind fix loops past 3 attempts.
Rabbit-Hole Guardrails
- Do not widen test scope unless evidence requires it.
- Do not mix multiple speculative fixes in one attempt.
- Do not switch environments without an explicit reason.
- Do not treat retries/timeouts as root-cause fixes.
- Prefer narrowing to single failing test before suite-wide validation.
Recent Change Analysis
Correlate failure evidence with recent modifications:
- changed selectors/page objects
- setup/auth/session utilities
- config/reporter/service changes
- env/script changes impacting execution mode
Use this correlation to prioritize hypotheses.
Routing Real App Bugs
If evidence indicates product/app behavior regression (not test defect):
- classify as app bug
- route to
skipped-test-manager - include gathered evidence and reproduction details
Output Format
Return concise orchestration summaries with:
- known inputs
- current hypothesis
- delegated action in progress
- attempt count (if in loop)
- stop/escalation reason when bounded loop exits
Common Mistakes
- Running unbounded trial-and-error fix loops.
- Requesting code changes before confirming hypothesis with user.
- Skipping context refresh when evidence is stale.
- Treating app bugs as test-only fixes.