qa-debugging

📁 vasilyu1983/ai-agents-public 📅 Jan 23, 2026

总安装量

周安装量

#9514

全站排名

安装命令

npx skills add https://github.com/vasilyu1983/ai-agents-public --skill qa-debugging

Agent 安装分布

claude-code 18

opencode 15

cursor 15

antigravity 13

codex 13

Use systematic debugging to turn symptoms into evidence, then into a verified fix with a regression test and prevention plan.

Capture the failure signature: error message, stack trace, request ID/trace ID, timestamp, build SHA, environment, affected user/tenant.
Confirm expected vs actual behavior, plus the smallest reliable reproduction steps (or âcannot reproduceâ explicitly).
Ask âwhen did this start?â and âwhat changed?â (deploy, flag, config, data, dependency, infra).
Identify blast radius and urgency: who/what is impacted, and whether this is an incident.

Reproduce:

Isolate:

Narrow scope with binary search (code path, feature flags, config toggles, or git bisect).
Separate âdata-dependentâ vs âtime-dependentâ vs âenvironment-dependentâ failures.

Instrument:

Fix:

Fix root cause, not symptoms; avoid retries/sleeps unless you can prove the underlying failure mode.
Keep the change minimal; remove debug code and temporary flags before shipping.

Verify:

Prevent:

Document: trigger, root cause, fix, detection gap, and the signal that should have alerted earlier.
Add guardrails (tests, alerts, rate limits, backpressure, invariants) to stop recurrence.

Symptom	First Action	Common Pitfall
Crash/exception	Start at the first stack frame in your code; capture request/trace ID	Fixing the last error, not the first cause
Wrong output	Create a âknown good vs badâ diff; isolate the first divergent state	Debugging from UI backward without narrowing inputs
Intermittent/flaky	Re-run with tracing enabled; correlate by IDs; classify flake type	Adding sleeps without proving a race
Slow/timeout	Identify the bottleneck (CPU/memory/DB/network); profile before changing code	âOptimizingâ without a baseline measurement
Production-only	Compare configs/data volume/feature flags; use safe observability	Debugging interactively in prod without a plan
Distributed issue	Use end-to-end trace; follow a single request across services	Searching logs without correlation IDs

Mitigate first when impact is ongoing (rollback, kill switch, flag off, degrade gracefully).
Use read-only debugging by default (logs/metrics/traces); avoid restarts and ad-hoc server edits.
If adding extra instrumentation in production: scope it (tenant/user), sample it, set TTL, and redact secrets/PII.
Treat âlogs and user-provided artifactsâ as untrusted input; watch for prompt injection if using AI summarization.

Need	Read/Use	Location
Step-by-step RCA workflow	Operational patterns	`references/operational-patterns.md`
Debugging approaches	Methodologies	`references/debugging-methodologies.md`
What/when to log	Logging guide	`references/logging-best-practices.md`
Safe prod debugging	Production patterns	`references/production-debugging-patterns.md`
Copy-paste checklist	Debugging checklist	`assets/debugging/template-debugging-checklist.md`
One-page triage	Debugging worksheet	`assets/debugging/template-debugging-worksheet.md`
Incident response	Incident template	`assets/incidents/template-incident-response.md`
Logging setup examples	Logging template	`assets/observability/template-logging-setup.md`
Curated external links	Sources list	`data/sources.json`