prometheus
4
总安装量
4
周安装量
#53270
全站排名
安装命令
npx skills add https://github.com/kontrolplane/skills --skill prometheus
Agent 安装分布
codex
4
amp
3
gemini-cli
3
github-copilot
3
kimi-cli
3
opencode
3
Skill 文档
Prometheus
PromQL Gotchas
Counter Functions (Critical)
Counters only increase. Never use raw counter valuesâalways use rate functions:
rate(http_requests_total[5m]) # Per-second average rate
irate(http_requests_total[5m]) # Instant rate (last 2 points, spiky)
increase(http_requests_total[1h]) # Total increase over range
rate()handles counter resets automatically- Use
rate()for dashboards,irate()only for high-resolution spikes
Range Vector Required
Rate functions need [duration]:
rate(metric[5m]) # Correct
rate(metric) # Error: expected range vector
Vector Matching
Binary operations require matching labels:
# This fails if label sets differ:
metric_a / metric_b
# Ignore extra labels:
metric_a / ignoring(extra_label) metric_b
# Match on specific labels only:
metric_a / on(common_label) metric_b
Histogram Quantiles
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)
- Must use
_bucketmetric withlelabel - Always wrap in
rate()for counters by (le)is required; add other labels as needed:by (le, endpoint)
Common Query Patterns
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
# CPU usage (node_exporter)
100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]) * 100)
# Memory usage
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
# Container memory (Kubernetes)
sum by (pod) (container_memory_working_set_bytes{container!=""})
Alerting Rules
groups:
- name: example
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
/ sum(rate(http_requests_total[5m])) by (job)
> 0.05
for: 5m # Must be firing for this duration
labels:
severity: warning
annotations:
summary: "Error rate {{ $value | humanizePercentage }} on {{ $labels.job }}"
for Clause
- Prevents flapping on brief spikes
- Alert stays “pending” until duration met
- Missing
for= immediate alerting
Recording Rules
Pre-compute expensive queries:
rules:
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))
Naming convention: level:metric:operations
Staleness
- Samples older than 5 minutes are “stale”
up == 0only fires if target was recently scraped- Use
absent(metric)to detect missing metrics entirely