chaos-engineering
8
总安装量
6
周安装量
#35021
全站排名
安装命令
npx skills add https://github.com/nguyenhuuca/assessment --skill chaos-engineering
Agent 安装分布
mcpjam
6
claude-code
6
replit
6
junie
6
windsurf
6
zencoder
6
Skill 文档
Chaos Engineering
Principles
- Build a Hypothesis: Define expected behavior
- Minimize Blast Radius: Start small
- Run in Production: Real conditions matter
- Automate: Make experiments repeatable
- Minimize Impact: Have abort conditions
Experiment Process
- Steady State: Define normal metrics
- Hypothesis: “System will maintain X under condition Y”
- Introduce Variables: Inject failure
- Observe: Compare to steady state
- Analyze: Confirm or disprove hypothesis
Common Experiments
Network Failures
# Add latency
tc qdisc add dev eth0 root netem delay 100ms
# Packet loss
tc qdisc add dev eth0 root netem loss 10%
# Remove
tc qdisc del dev eth0 root
Resource Exhaustion
# CPU stress
stress --cpu 4 --timeout 60s
# Memory stress
stress --vm 2 --vm-bytes 1G --timeout 60s
# Disk fill
dd if=/dev/zero of=/tmp/fill bs=1M count=1024
Service Failures
- Kill processes
- Restart containers
- Terminate instances
- Block dependencies
Chaos Tools
- Chaos Monkey: Random instance termination
- Gremlin: Comprehensive chaos platform
- Litmus: Kubernetes chaos engineering
- Chaos Mesh: Cloud-native chaos
Experiment Template
## Experiment: [Name]
### Hypothesis
If [condition], then [expected behavior].
### Steady State
- Metric A: [baseline value]
- Metric B: [baseline value]
### Method
1. [Step 1]
2. [Step 2]
3. [Step 3]
### Abort Conditions
- If [condition], stop immediately
### Results
[What happened]
### Findings
[What we learned]
Safety Rules
- Start in non-production
- Have rollback ready
- Monitor continuously
- Communicate with team
- Document everything