observability-alerting

📁 kentoshimizu/sw-agent-skills 📅 1 day ago
0
总安装量
1
周安装量
安装命令
npx skills add https://github.com/kentoshimizu/sw-agent-skills --skill observability-alerting

Agent 安装分布

amp 1
cline 1
opencode 1
cursor 1
continue 1
kimi-cli 1

Skill 文档

Observability Alerting

Overview

Use this skill to design alerting that catches real incidents quickly without overwhelming responders.

Scope Boundaries

  • Use this skill when the task matches the trigger condition described in description.
  • Do not use this skill when the primary task falls outside this skill’s domain.

Shared References

  • Alert threshold actionability rules:
    • references/alert-threshold-actionability-rules.md

Templates And Assets

  • Alert catalog template:
    • assets/alert-catalog-template.csv
  • Alert noise review checklist:
    • assets/alert-noise-review-checklist.md

Inputs To Gather

  • Critical user/system failure modes.
  • Available telemetry signals and quality.
  • On-call routing and escalation policy.
  • Historical false-positive/false-negative patterns.

Deliverables

  • Alert catalog with severity, owner, and runbook linkage.
  • Threshold and routing policy.
  • Noise-control and tuning plan.

Workflow

  1. Build initial alert catalog in assets/alert-catalog-template.csv.
  2. Set thresholds using references/alert-threshold-actionability-rules.md.
  3. Define routing/escalation by severity.
  4. Validate with assets/alert-noise-review-checklist.md.
  5. Publish tuning backlog and ownership.

Quality Standard

  • Alerts are actionable and owned.
  • Critical paths have coverage with bounded noise.
  • Paging vs non-paging intent is explicit.

Failure Conditions

  • Stop when alerts are noisy, non-actionable, or ownerless.
  • Stop when critical failure modes lack alert coverage.
  • Escalate when alert quality risks SLO breach response.