loki-logging

📁 bagelhole/devops-security-agent-skills 📅 9 days ago

总安装量

周安装量

#51012

全站排名

安装命令

npx skills add https://github.com/bagelhole/devops-security-agent-skills --skill loki-logging

Agent 安装分布

opencode 1

codex 1

claude-code 1

Skill 文档

Grafana Loki

Aggregate and query logs with Grafana Loki, the Prometheus-inspired logging system.

When to Use This Skill

Use this skill when:

Implementing cost-effective log aggregation
Building logging for Kubernetes environments
Integrating logs with Grafana dashboards
Querying logs with label-based filtering
Preferring lighter-weight alternative to ELK

Prerequisites

Docker or Kubernetes
Grafana for visualization
Promtail or other log shipper

Architecture Overview

âââââââââââââââ     ââââââââââââ     ââââââââââââ
â Application ââââââ¶â Promtail ââââââ¶â   Loki   â
âââââââââââââââ     ââââââââââââ     ââââââââââââ
                                          â
                                          â¼
                                     ââââââââââââ
                                     â Grafana  â
                                     ââââââââââââ

Docker Deployment

# docker-compose.yml
version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yaml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

volumes:
  loki-data:
  grafana-data:

Loki Configuration

# loki-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_query_series: 5000
  max_query_parallelism: 2

chunk_store_config:
  max_look_back_period: 168h

table_manager:
  retention_deletes_enabled: true
  retention_period: 168h

Promtail Configuration

# promtail-config.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # System logs
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*.log

  # Docker container logs
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

  # Application logs with parsing
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: application
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message
            timestamp: timestamp
      - labels:
          level:
      - timestamp:
          source: timestamp
          format: RFC3339

Kubernetes Deployment

# Using Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true

Promtail DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: promtail
  template:
    metadata:
      labels:
        app: promtail
    spec:
      containers:
        - name: promtail
          image: grafana/promtail:2.9.0
          args:
            - -config.file=/etc/promtail/promtail.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/promtail
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: promtail-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

LogQL Queries

Basic Queries

# All logs from a job
{job="application"}

# Filter by label
{job="application", level="error"}

# Multiple labels
{namespace="production", container="api"}

# Regex match
{job=~"app.*"}

Log Pipeline

# Filter by content
{job="application"} |= "error"

# Exclude content
{job="application"} != "debug"

# Regex filter
{job="application"} |~ "user_id=[0-9]+"

# JSON parsing
{job="application"} | json | level="error"

# Line format
{job="application"} | json | line_format "{{.level}}: {{.message}}"

Metric Queries

# Count logs per second
count_over_time({job="application"}[5m])

# Rate of errors
rate({job="application", level="error"}[5m])

# Sum by label
sum by (level) (count_over_time({job="application"}[5m]))

# Top services by error count
topk(5, sum by (service) (count_over_time({level="error"}[1h])))

Aggregations

# Average log line length
avg_over_time({job="application"} | unwrap line_length [5m])

# Percentile of numeric field
quantile_over_time(0.95, {job="application"} | json | unwrap response_time [5m])

# Error percentage
sum(rate({job="application", level="error"}[5m])) 
/ 
sum(rate({job="application"}[5m])) * 100

Pipeline Stages

# promtail-config.yaml
pipeline_stages:
  # Parse JSON logs
  - json:
      expressions:
        level: level
        message: msg
        trace_id: trace_id

  # Extract with regex
  - regex:
      expression: 'user_id=(?P<user_id>\d+)'

  # Add labels from parsed fields
  - labels:
      level:
      user_id:

  # Modify timestamp
  - timestamp:
      source: timestamp
      format: '2006-01-02T15:04:05.000Z'

  # Filter logs
  - match:
      selector: '{level="debug"}'
      action: drop

  # Add static labels
  - static_labels:
      environment: production

  # Modify log line
  - template:
      source: message
      template: '{{ ToUpper .Value }}'

Grafana Integration

Data Source Configuration

# grafana/provisioning/datasources/loki.yaml
apiVersion: 1

datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false
    jsonData:
      maxLines: 1000

Dashboard Panel

{
  "title": "Application Logs",
  "type": "logs",
  "datasource": "Loki",
  "targets": [
    {
      "expr": "{job=\"application\"} | json",
      "refId": "A"
    }
  ],
  "options": {
    "showTime": true,
    "showLabels": true,
    "wrapLogMessage": true
  }
}

Recording Rules

# loki-rules.yaml
groups:
  - name: error_rates
    interval: 1m
    rules:
      - record: job:log_errors:rate5m
        expr: |
          sum by (job) (rate({level="error"}[5m]))

Alerting

# loki-alerts.yaml
groups:
  - name: log_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({level="error"}[5m])) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate in logs"
          description: "Error rate is {{ $value }} errors/second"

Common Issues

Issue: High Memory Usage

Problem: Loki consuming too much memory Solution: Reduce max_query_series, limit query time range

Issue: Logs Not Appearing

Problem: Promtail not shipping logs Solution: Check positions file, verify file paths, check label configuration

Issue: Query Timeout

Problem: LogQL queries timing out Solution: Add more specific label filters, reduce time range

Issue: Ingestion Rate Limit

Problem: Logs being dropped Solution: Increase per_stream_rate_limit in limits_config

Best Practices

Use meaningful labels (avoid high cardinality)
Filter by labels before log content
Parse logs at collection time with Promtail
Set appropriate retention periods
Use recording rules for common queries
Implement proper multitenancy for large deployments
Monitor Loki’s own metrics
Use chunk caching for better performance

Related Skills

prometheus-grafana – Metrics monitoring
elk-stack – Alternative logging
alerting-oncall – Alert management

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台