kubernetes-operations

📁 rohitg00/awesome-claude-code-toolkit 📅 2 days ago

总安装量

周安装量

#49335

全站排名

安装命令

npx skills add https://github.com/rohitg00/awesome-claude-code-toolkit --skill kubernetes-operations

Agent 安装分布

replit 1

trae 1

trae-cn 1

claude-code 1

Skill 文档

Kubernetes Operations

Deployment Manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        version: v1
    spec:
      containers:
        - name: api
          image: registry.example.com/api:1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server

Always set resource requests and limits. Use topology spread constraints for high availability.

Helm Chart Structure

chart/
  Chart.yaml
  values.yaml
  values-staging.yaml
  values-production.yaml
  templates/
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    _helpers.tpl

# values.yaml
replicaCount: 2
image:
  repository: registry.example.com/api
  tag: "1.2.0"
  pullPolicy: IfNotPresent
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilization: 70

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Troubleshooting Commands

# Pod diagnostics
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container> --previous
kubectl exec -it <pod-name> -- /bin/sh

# Resource usage
kubectl top pods -n <namespace> --sort-by=memory
kubectl top nodes

# Network debugging
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash
nslookup <service-name>.<namespace>.svc.cluster.local

# Events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Find pods not running
kubectl get pods -A --field-selector=status.phase!=Running

Anti-Patterns

Running containers as root without securityContext.runAsNonRoot: true
Missing resource requests/limits (causes scheduling issues and noisy neighbors)
Using latest tag instead of pinned image versions
Not setting PodDisruptionBudget for critical workloads
Storing secrets in ConfigMaps instead of Secrets (or external secret managers)
Ignoring pod anti-affinity for replicated deployments

Checklist

All containers have resource requests and limits
Liveness and readiness probes configured
Images use specific version tags, not latest
Secrets stored in Kubernetes Secrets or external vault
PodDisruptionBudget set for production workloads
NetworkPolicies restrict traffic between namespaces
Topology spread constraints or anti-affinity for HA
Helm values split per environment (staging, production)

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台