error-patterns

📁 doubleslashse/claude-marketplace 📅 Jan 25, 2026

总安装量

周安装量

#48471

全站排名

安装命令

npx skills add https://github.com/doubleslashse/claude-marketplace --skill error-patterns

Agent 安装分布

opencode 2

claude-code 2

windsurf 1

antigravity 1

gemini-cli 1

Skill 文档

Error Patterns Skill

Overview

This skill provides knowledge for recognizing, categorizing, and resolving common infrastructure errors. It covers error classification, diagnostic techniques, and resolution strategies.

Error Classification Framework

By Severity

Severity	Definition	Response Time	Example
Critical	Service completely down	Immediate	Database unreachable
High	Major functionality broken	< 1 hour	Auth failures
Medium	Partial functionality affected	< 4 hours	Slow queries
Low	Minor issues, workarounds exist	< 24 hours	Deprecation warnings

By Category

Category	Subcategories	Typical Causes
Database	Connection, Query, Transaction, Replication	Pool exhaustion, locks, slow queries
Network	DNS, Timeout, Connection	Misconfiguration, service down
Authentication	Token, Permission, Provider	Expired tokens, wrong credentials
Application	Logic, Memory, Timeout	Bugs, resource leaks
Infrastructure	Disk, CPU, Memory	Resource exhaustion
External	API, Service, Rate limit	Third-party issues

By Pattern Type

Pattern	Description	Example
Transient	Self-resolving, retry works	Network blip
Persistent	Consistent, needs fix	Misconfiguration
Cascading	One failure causes others	DB down â API errors
Intermittent	Random occurrence	Race condition
Load-dependent	Appears under load	Connection exhaustion

Diagnostic Methodology

The 5 Whys

Dig deeper for root cause:

Symptom: API returning 500 errors
  Why? â Database query failing
    Why? â Connection timeout
      Why? â Connection pool exhausted
        Why? â Connections not released
          Why? â Missing finally block in error handler

ROOT CAUSE: Code bug in error handling

Timeline Analysis

Map events chronologically:

T-60m: Deployment completed
T-45m: Memory usage started climbing
T-30m: First slow query warning
T-15m: Connection pool warnings
T-0:   Service unavailable

Fault Tree

Break down possible causes:

                [Service Down]
                      |
        +-------------+-------------+
        |             |             |
    [Database]    [Network]    [Application]
        |             |             |
    +---+---+     +---+---+     +---+---+
    |       |     |       |     |       |
 [Conn]  [Query] [DNS]  [FW]  [OOM]  [Bug]

Error Resolution Process

Step 1: Identify

What is the exact error message?
When did it start?
What’s the impact?

Step 2: Categorize

Which category does this fall into?
Is it transient or persistent?
What’s the severity?

Step 3: Investigate

Gather relevant logs
Check recent changes
Look for patterns

Step 4: Diagnose

Apply 5 Whys
Build timeline
Identify root cause

Step 5: Remediate

Apply immediate fix
Verify resolution
Document for prevention

Error Correlation Techniques

Cross-Platform Correlation

Match errors across systems:

14:30:01 [Railway]  Connection refused to db:5432
14:30:01 [Supabase] Too many connections
14:30:00 [GitHub]   Deployment completed
â Deployment triggered connection spike

Error Chains

Follow the cascade:

[1] Initial: Database connection timeout
[2] Result:  API endpoint returns 500
[3] Result:  Frontend shows error page
[4] Result:  User reports "site is down"

Impact Mapping

Error: Auth service down
âââ Direct Impact
â   âââ No new logins
âââ Cascade Impact
â   âââ API requests fail (no token validation)
â   âââ Realtime connections drop
âââ User Impact
    âââ All users affected

Resolution Strategies

Immediate Mitigation

Strategy	Use When	Example
Rollback	Recent deployment caused issue	`git revert`
Restart	Service stuck/crashed	Container restart
Scale up	Resource exhaustion	Add replicas
Failover	Primary system down	Switch to backup
Rate limit	Overload	Block/throttle traffic
Circuit break	Cascading failures	Disable failing component

Root Cause Fix

Cause	Fix Approach
Code bug	Deploy fix, add tests
Configuration	Update config, validate
Resource limit	Increase limits or optimize
External dependency	Add retry/fallback
Infrastructure	Scale or redesign

Prevention

Issue	Prevention
Connection leaks	Connection pooling, timeouts
Memory leaks	Profiling, limits
Slow queries	Indexes, query optimization
Deployment failures	Canary deployments, rollback automation
External failures	Circuit breakers, fallbacks

Common Resolution Templates

Database Connection Issues

## Issue: Database Connection Error

### Immediate Actions
1. Check connection count:
   SELECT count(*) FROM pg_stat_activity;
2. Identify idle connections:
   SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction';
3. Kill stuck connections if safe:
   SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE ...;

### Root Cause Fix
- Add connection pooling (PgBouncer)
- Implement connection timeouts
- Fix connection leak in application code

### Prevention
- Monitor connection metrics
- Alert on pool usage > 80%
- Regular connection audits

API Error Spike

## Issue: API 500 Errors

### Immediate Actions
1. Check API logs for error pattern
2. Identify failing endpoint(s)
3. Check downstream dependencies

### Root Cause Fix
- Fix code bug causing exception
- Handle edge cases
- Add proper error handling

### Prevention
- Add error monitoring
- Implement circuit breakers
- Add integration tests

See common-errors.md for a catalog of specific errors and solutions.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台