prometheus

📁 kontrolplane/skills 📅 Feb 5, 2026
4
总安装量
4
周安装量
#53270
全站排名
安装命令
npx skills add https://github.com/kontrolplane/skills --skill prometheus

Agent 安装分布

codex 4
amp 3
gemini-cli 3
github-copilot 3
kimi-cli 3
opencode 3

Skill 文档

Prometheus

PromQL Gotchas

Counter Functions (Critical)

Counters only increase. Never use raw counter values—always use rate functions:

rate(http_requests_total[5m])      # Per-second average rate
irate(http_requests_total[5m])     # Instant rate (last 2 points, spiky)
increase(http_requests_total[1h])  # Total increase over range
  • rate() handles counter resets automatically
  • Use rate() for dashboards, irate() only for high-resolution spikes

Range Vector Required

Rate functions need [duration]:

rate(metric[5m])    # Correct
rate(metric)        # Error: expected range vector

Vector Matching

Binary operations require matching labels:

# This fails if label sets differ:
metric_a / metric_b

# Ignore extra labels:
metric_a / ignoring(extra_label) metric_b

# Match on specific labels only:
metric_a / on(common_label) metric_b

Histogram Quantiles

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)
  • Must use _bucket metric with le label
  • Always wrap in rate() for counters
  • by (le) is required; add other labels as needed: by (le, endpoint)

Common Query Patterns

# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m]))

# CPU usage (node_exporter)
100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]) * 100)

# Memory usage
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

# Container memory (Kubernetes)
sum by (pod) (container_memory_working_set_bytes{container!=""})

Alerting Rules

groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
            / sum(rate(http_requests_total[5m])) by (job)
            > 0.05
        for: 5m          # Must be firing for this duration
        labels:
          severity: warning
        annotations:
          summary: "Error rate {{ $value | humanizePercentage }} on {{ $labels.job }}"

for Clause

  • Prevents flapping on brief spikes
  • Alert stays “pending” until duration met
  • Missing for = immediate alerting

Recording Rules

Pre-compute expensive queries:

rules:
  - record: job:http_requests:rate5m
    expr: sum by (job) (rate(http_requests_total[5m]))

Naming convention: level:metric:operations

Staleness

  • Samples older than 5 minutes are “stale”
  • up == 0 only fires if target was recently scraped
  • Use absent(metric) to detect missing metrics entirely