loki-logging

📁 bagelhole/devops-security-agent-skills 📅 9 days ago
1
总安装量
1
周安装量
#51012
全站排名
安装命令
npx skills add https://github.com/bagelhole/devops-security-agent-skills --skill loki-logging

Agent 安装分布

opencode 1
codex 1
claude-code 1

Skill 文档

Grafana Loki

Aggregate and query logs with Grafana Loki, the Prometheus-inspired logging system.

When to Use This Skill

Use this skill when:

  • Implementing cost-effective log aggregation
  • Building logging for Kubernetes environments
  • Integrating logs with Grafana dashboards
  • Querying logs with label-based filtering
  • Preferring lighter-weight alternative to ELK

Prerequisites

  • Docker or Kubernetes
  • Grafana for visualization
  • Promtail or other log shipper

Architecture Overview

┌─────────────┐     ┌──────────┐     ┌──────────┐
│ Application │────▶│ Promtail │────▶│   Loki   │
└─────────────┘     └──────────┘     └──────────┘
                                          │
                                          ▼
                                     ┌──────────┐
                                     │ Grafana  │
                                     └──────────┘

Docker Deployment

# docker-compose.yml
version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yaml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

volumes:
  loki-data:
  grafana-data:

Loki Configuration

# loki-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_query_series: 5000
  max_query_parallelism: 2

chunk_store_config:
  max_look_back_period: 168h

table_manager:
  retention_deletes_enabled: true
  retention_period: 168h

Promtail Configuration

# promtail-config.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # System logs
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*.log

  # Docker container logs
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

  # Application logs with parsing
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: application
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message
            timestamp: timestamp
      - labels:
          level:
      - timestamp:
          source: timestamp
          format: RFC3339

Kubernetes Deployment

# Using Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true

Promtail DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: promtail
  template:
    metadata:
      labels:
        app: promtail
    spec:
      containers:
        - name: promtail
          image: grafana/promtail:2.9.0
          args:
            - -config.file=/etc/promtail/promtail.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/promtail
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: promtail-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

LogQL Queries

Basic Queries

# All logs from a job
{job="application"}

# Filter by label
{job="application", level="error"}

# Multiple labels
{namespace="production", container="api"}

# Regex match
{job=~"app.*"}

Log Pipeline

# Filter by content
{job="application"} |= "error"

# Exclude content
{job="application"} != "debug"

# Regex filter
{job="application"} |~ "user_id=[0-9]+"

# JSON parsing
{job="application"} | json | level="error"

# Line format
{job="application"} | json | line_format "{{.level}}: {{.message}}"

Metric Queries

# Count logs per second
count_over_time({job="application"}[5m])

# Rate of errors
rate({job="application", level="error"}[5m])

# Sum by label
sum by (level) (count_over_time({job="application"}[5m]))

# Top services by error count
topk(5, sum by (service) (count_over_time({level="error"}[1h])))

Aggregations

# Average log line length
avg_over_time({job="application"} | unwrap line_length [5m])

# Percentile of numeric field
quantile_over_time(0.95, {job="application"} | json | unwrap response_time [5m])

# Error percentage
sum(rate({job="application", level="error"}[5m])) 
/ 
sum(rate({job="application"}[5m])) * 100

Pipeline Stages

# promtail-config.yaml
pipeline_stages:
  # Parse JSON logs
  - json:
      expressions:
        level: level
        message: msg
        trace_id: trace_id

  # Extract with regex
  - regex:
      expression: 'user_id=(?P<user_id>\d+)'

  # Add labels from parsed fields
  - labels:
      level:
      user_id:

  # Modify timestamp
  - timestamp:
      source: timestamp
      format: '2006-01-02T15:04:05.000Z'

  # Filter logs
  - match:
      selector: '{level="debug"}'
      action: drop

  # Add static labels
  - static_labels:
      environment: production

  # Modify log line
  - template:
      source: message
      template: '{{ ToUpper .Value }}'

Grafana Integration

Data Source Configuration

# grafana/provisioning/datasources/loki.yaml
apiVersion: 1

datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false
    jsonData:
      maxLines: 1000

Dashboard Panel

{
  "title": "Application Logs",
  "type": "logs",
  "datasource": "Loki",
  "targets": [
    {
      "expr": "{job=\"application\"} | json",
      "refId": "A"
    }
  ],
  "options": {
    "showTime": true,
    "showLabels": true,
    "wrapLogMessage": true
  }
}

Recording Rules

# loki-rules.yaml
groups:
  - name: error_rates
    interval: 1m
    rules:
      - record: job:log_errors:rate5m
        expr: |
          sum by (job) (rate({level="error"}[5m]))

Alerting

# loki-alerts.yaml
groups:
  - name: log_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({level="error"}[5m])) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate in logs"
          description: "Error rate is {{ $value }} errors/second"

Common Issues

Issue: High Memory Usage

Problem: Loki consuming too much memory Solution: Reduce max_query_series, limit query time range

Issue: Logs Not Appearing

Problem: Promtail not shipping logs Solution: Check positions file, verify file paths, check label configuration

Issue: Query Timeout

Problem: LogQL queries timing out Solution: Add more specific label filters, reduce time range

Issue: Ingestion Rate Limit

Problem: Logs being dropped Solution: Increase per_stream_rate_limit in limits_config

Best Practices

  • Use meaningful labels (avoid high cardinality)
  • Filter by labels before log content
  • Parse logs at collection time with Promtail
  • Set appropriate retention periods
  • Use recording rules for common queries
  • Implement proper multitenancy for large deployments
  • Monitor Loki’s own metrics
  • Use chunk caching for better performance

Related Skills