devops-cicd

📁 sunnypatneedi/claude-starter-kit 📅 Jan 27, 2026

总安装量

周安装量

#60142

全站排名

安装命令

npx skills add https://github.com/sunnypatneedi/claude-starter-kit --skill devops-cicd

Agent 安装分布

mcpjam 2

neovate 2

antigravity 2

qwen-code 2

windsurf 2

zencoder 2

Skill 文档

DevOps & CI/CD

Complete framework for building automated pipelines that let you ship faster with confidence.

When to Use

Setting up CI/CD for new projects
Improving existing deployment pipelines
Dockerizing applications
Implementing infrastructure as code
Choosing deployment strategies
Setting up monitoring and alerts

Core DevOps Principles

Automate Everything:

If you do it twice, script it
If you script it twice, make it a pipeline
Reduce manual intervention

Shift Left:

Test early
Security early
Quality early
Catch issues before production

Infrastructure as Code:

Version control your infrastructure
Review changes like code
Reproducible environments

Workflow

Step 1: CI/CD Pipeline Design

Pipeline Stages:

COMMIT â BUILD â TEST â SECURITY â DEPLOY â MONITOR

1. COMMIT
   âââ Code pushed to GitHub
   âââ Trigger pipeline
   âââ Validate commit message

2. BUILD
   âââ Install dependencies
   âââ Compile/bundle
   âââ Build Docker image
   âââ Cache for speed

3. TEST
   âââ Lint code
   âââ Type check
   âââ Unit tests
   âââ Integration tests
   âââ Coverage check

4. SECURITY
   âââ SAST (static analysis)
   âââ Dependency scan
   âââ Secret detection
   âââ Container scan

5. DEPLOY
   âââ Deploy to environment
   âââ Run migrations
   âââ Health checks
   âââ Smoke tests

6. MONITOR
   âââ Performance metrics
   âââ Error rates
   âââ Alerting
   âââ Rollback if needed

Step 2: GitHub Actions CI/CD

Complete Pipeline Example:

name: CI/CD

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  # Build and test
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type check
        run: npm run typecheck

      - name: Unit tests
        run: npm test -- --coverage

      - name: Build
        run: npm run build

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

  # Security scanning
  security:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Run Snyk to check for vulnerabilities
        uses: snyk/actions/node@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

      - name: Check for secrets
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./

  # Build and push Docker image
  docker:
    needs: [build, security]
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'

    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: myapp/api:${{ github.sha }},myapp/api:latest
          cache-from: type=registry,ref=myapp/api:latest
          cache-to: type=inline

  # Deploy to staging
  deploy-staging:
    needs: [docker]
    runs-on: ubuntu-latest
    environment: staging

    steps:
      - name: Deploy to staging
        run: |
          # SSH into staging server and deploy
          ssh ${{ secrets.STAGING_USER }}@${{ secrets.STAGING_HOST }} << 'EOF'
            docker pull myapp/api:${{ github.sha }}
            docker-compose up -d
          EOF

      - name: Run E2E tests
        run: npm run test:e2e
        env:
          BASE_URL: https://staging.myapp.com

      - name: Smoke tests
        run: |
          curl -f https://staging.myapp.com/health || exit 1

  # Deploy to production (manual approval required)
  deploy-production:
    needs: [deploy-staging]
    runs-on: ubuntu-latest
    environment: production

    steps:
      - name: Deploy to production
        run: |
          ssh ${{ secrets.PROD_USER }}@${{ secrets.PROD_HOST }} << 'EOF'
            docker pull myapp/api:${{ github.sha }}
            docker-compose up -d --no-deps api
          EOF

      - name: Health check
        run: |
          curl -f https://myapp.com/health || exit 1

      - name: Notify Slack
        uses: slackapi/slack-github-action@v1
        with:
          webhook-url: ${{ secrets.SLACK_WEBHOOK }}
          payload: |
            {
              "text": "Deployed ${{ github.sha }} to production â"
            }

Step 3: Docker Best Practices

Multi-Stage Dockerfile:

# Stage 1: Build
FROM node:20-alpine AS builder

WORKDIR /app

# Copy package files first (cache optimization)
COPY package*.json ./

# Install ALL dependencies (including devDependencies for build)
RUN npm ci

# Copy source code
COPY . .

# Build application
RUN npm run build

# Stage 2: Production
FROM node:20-alpine

WORKDIR /app

# Install only production dependencies
COPY package*.json ./
RUN npm ci --only=production

# Copy built files from builder stage
COPY --from=builder /app/dist ./dist

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Change ownership
RUN chown -R appuser:appgroup /app

# Switch to non-root user
USER appuser

EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \
  CMD node healthcheck.js || exit 1

CMD ["node", "dist/index.js"]

docker-compose.yml:

version: '3.8'

services:
  api:
    image: myapp/api:latest
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
    depends_on:
      - db
      - redis
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 3s
      retries: 3

  db:
    image: postgres:15-alpine
    restart: unless-stopped
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=${DB_PASSWORD}
      - POSTGRES_DB=myapp

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

Step 4: Deployment Strategies

Blue-Green Deployment:

CONCEPT:
âââ Two identical environments (Blue/Green)
âââ One serves traffic, one is idle
âââ Deploy to idle, then switch traffic
âââ Instant rollback by switching back

FLOW:
1. Blue is live, Green is idle
2. Deploy new version to Green
3. Test Green thoroughly
4. Switch load balancer to Green
5. Green is now live (Blue becomes rollback target)

Kubernetes Blue-Green:

# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
      version: green
  template:
    metadata:
      labels:
        app: api
        version: green
    spec:
      containers:
        - name: api
          image: myapp/api:v2.0.0

---
# Switch traffic by updating service selector
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
    version: green  # Change from "blue" to "green"
  ports:
    - port: 80
      targetPort: 3000

Canary Deployment:

CONCEPT:
âââ Release to small subset first (5% traffic)
âââ Monitor for issues
âââ Gradually increase traffic (25%, 50%, 100%)
âââ Rollback if problems detected

KUBERNETES EXAMPLE:
# 95% traffic to stable, 5% to canary

apiVersion: v1
kind: Service
metadata:
  name: api-stable
spec:
  selector:
    app: api
    version: stable
  ports:
    - port: 80

---
apiVersion: v1
kind: Service
metadata:
  name: api-canary
spec:
  selector:
    app: api
    version: canary
  ports:
    - port: 80

---
# Ingress splits traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "5"  # 5% to canary
spec:
  rules:
    - host: api.myapp.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-canary
                port:
                  number: 80

Rolling Deployment:

# Kubernetes rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1   # Max pods down during update
      maxSurge: 1         # Max extra pods during update
  template:
    spec:
      containers:
        - name: api
          image: myapp/api:v2.0.0

Step 5: Infrastructure as Code

Terraform Example:

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true

  tags = {
    Name        = "${var.project}-vpc"
    Environment = var.environment
  }
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "${var.project}-${var.environment}"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# RDS Database
resource "aws_db_instance" "main" {
  identifier           = "${var.project}-${var.environment}"
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = var.db_instance_class
  allocated_storage    = 20
  storage_encrypted    = true
  skip_final_snapshot  = var.environment != "production"

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  vpc_security_group_ids = [aws_security_group.db.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name

  backup_retention_period = var.environment == "production" ? 7 : 0

  tags = {
    Name        = "${var.project}-db"
    Environment = var.environment
  }
}

Step 6: Monitoring & Health Checks

Health Endpoints:

// health.js
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    timestamp: new Date().toISOString(),
    version: process.env.VERSION || 'unknown',
    uptime: process.uptime()
  });
});

// Readiness check (can serve traffic)
app.get('/ready', async (req, res) => {
  try {
    // Check database
    await db.query('SELECT 1');

    // Check cache
    await redis.ping();

    // Check critical dependencies
    // await externalAPI.healthCheck();

    res.json({
      status: 'ready',
      checks: {
        database: 'ok',
        redis: 'ok'
      }
    });
  } catch (error) {
    res.status(503).json({
      status: 'not ready',
      error: error.message
    });
  }
});

Key Metrics (Golden Signals):

LATENCY: How long requests take
TRAFFIC: Requests per second
ERRORS: Error rate percentage
SATURATION: Resource utilization (CPU, memory)

DevOps Checklist

## DevOps Review: [Project]

### CI/CD
- [ ] Pipeline runs on every PR
- [ ] Tests pass before merge
- [ ] Security scans included
- [ ] Automated deployment to staging
- [ ] Manual approval for production
- [ ] Rollback mechanism in place

### Docker
- [ ] Multi-stage builds
- [ ] Non-root user
- [ ] Health checks defined
- [ ] Secrets via env vars
- [ ] Image scanning for vulnerabilities

### Infrastructure
- [ ] Infrastructure as code (Terraform/CloudFormation)
- [ ] Secrets managed securely (not in code)
- [ ] Environments are reproducible
- [ ] Backups configured and tested
- [ ] Disaster recovery plan documented

### Monitoring
- [ ] Health endpoints implemented
- [ ] Metrics collected (Golden Signals)
- [ ] Logging centralized
- [ ] Alerts configured (PagerDuty, Slack)
- [ ] Dashboards created (Grafana, Datadog)

### Security
- [ ] Dependencies scanned automatically
- [ ] Secrets not in code or logs
- [ ] Network policies defined
- [ ] Least privilege access (IAM, RBAC)
- [ ] Security headers configured

Common Pitfalls

Don’t	Do
Manual deployments	Automate with CI/CD
Skip tests in pipeline	Run all tests before deploy
Use `latest` tag	Use specific versions/SHAs
Run as root in containers	Create non-root user
Commit secrets to git	Use secrets management
Deploy straight to prod	Deploy to staging first
No rollback plan	Test rollback procedure
Ignore failed health checks	Fail deployment if unhealthy

Tools & Technologies

CI/CD:

GitHub Actions
GitLab CI
CircleCI
Jenkins

Containers:

Docker
Kubernetes
Docker Compose
Podman

Infrastructure as Code:

Terraform
AWS CloudFormation
Pulumi
Ansible

Monitoring:

Prometheus + Grafana
Datadog
New Relic
Sentry

Related Skills

/testing-strategies – Running tests in CI/CD
/security-review – Security scanning in pipelines
/performance-optimization – Performance monitoring

Last Updated: 2026-01-22

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台