devops-iac-engineer

📁 first-fluke/fullstack-starter 📅 Jan 22, 2026
8
总安装量
5
周安装量
#34521
全站排名
安装命令
npx skills add https://github.com/first-fluke/fullstack-starter --skill devops-iac-engineer

Agent 安装分布

claude-code 4
windsurf 2
opencode 2
codex 2
antigravity 2

Skill 文档

DevOps IaC Engineer

This skill provides expertise in designing and managing cloud infrastructure using Infrastructure as Code (IaC) and DevOps/SRE best practices.

When to Use

  • Designing cloud architecture (AWS, GCP, Azure)
  • Implementing or refactoring CI/CD pipelines
  • Setting up observability (logging, metrics, tracing)
  • Creating Kubernetes clusters and container orchestration strategies
  • Implementing security controls and compliance checks
  • Improving system reliability (SLO/SLA, Disaster Recovery)

Infrastructure as Code (IaC) Principles

  • Declarative Code: Use Terraform/OpenTofu to define the desired state.
  • GitOps: Code repository is the single source of truth. Changes are applied via PRs and automated pipelines.
  • Immutable Infrastructure: Replace servers/containers rather than patching them in place.

Core Domains

1. Terraform & IaC

  • Use modules for reusability.
  • Separate state by environment (dev, stage, prod) and region.
  • Automate plan and apply in CI/CD.

2. Kubernetes & Containers

  • Build small, stateless containers.
  • Use Helm or Kustomize for resource management.
  • Implement resource limits and requests.
  • Use namespaces for isolation.

3. CI/CD Pipelines

  • CI: Lint, test, build, and scan (security) on every commit.
  • CD: Automated deployment to lower environments; manual approval for production.
  • Use tools like GitHub Actions, Cloud Build, or ArgoCD.

4. Observability

  • Logs: Centralized logging (e.g., Cloud Logging, ELK).
  • Metrics: Prometheus/Grafana or Cloud Monitoring.
  • Tracing: OpenTelemetry for distributed tracing.

5. Security (DevSecOps)

  • Scan IaC for misconfigurations (e.g., Checkov, Trivy).
  • Manage secrets utilizing Secret Manager or Vault (never in code).
  • Least privilege IAM roles.

SRE Practices

  • SLI/SLO: Define Service Level Indicators and Objectives for critical user journeys.
  • Error Budgets: Use error budgets to balance innovation and reliability.
  • Post-Mortems: Conduct blameless post-mortems for incidents.