cloud-infrastructure
4
总安装量
2
周安装量
#49277
全站排名
安装命令
npx skills add https://github.com/pluginagentmarketplace/custom-plugin-cloudflare --skill cloud-infrastructure
Agent 安装分布
claude-code
2
opencode
1
cursor
1
codex
1
github-copilot
1
Skill 文档
Cloud Infrastructure Skill
Quick Reference
| Platform | Market | Best For | Learning |
|---|---|---|---|
| AWS | 32% | Everything | 3-6 mo |
| Azure | 24% | Microsoft stack | 3-6 mo |
| GCP | 11% | Data, ML | 3-6 mo |
| Cloudflare | Edge | CDN, Workers | 2-4 wk |
Learning Paths
AWS
[1] IAM + VPC (1-2 wk)
â ââ Roles, policies, networking
â
â¼
[2] Compute: EC2, Lambda (2-3 wk)
â
â¼
[3] Storage: S3, EBS (1-2 wk)
â
â¼
[4] Database: RDS, DynamoDB (2-3 wk)
â
â¼
[5] Containers: ECS, EKS (3-4 wk)
â
â¼
[6] Monitoring: CloudWatch (1-2 wk)
Docker & Containers
[1] Docker Basics (1 wk)
â ââ Images, containers, Dockerfile
â
â¼
[2] Multi-stage Builds (1 wk)
â ââ Optimization, layer caching
â
â¼
[3] Docker Compose (1 wk)
â ââ Multi-container apps
â
â¼
[4] Registry & Security (1 wk)
ââ Push/pull, scanning, non-root
Kubernetes
[1] Pods & Deployments (2 wk)
â
â¼
[2] Services & Networking (1-2 wk)
â
â¼
[3] ConfigMaps & Secrets (1 wk)
â
â¼
[4] Helm Charts (2 wk)
â
â¼
[5] Production Patterns (ongoing)
ââ HPA, PDB, resource limits
Terraform (IaC)
[1] Resources & State (1 wk)
â
â¼
[2] Variables & Outputs (1 wk)
â
â¼
[3] Modules (1-2 wk)
â
â¼
[4] Remote State (1 wk)
â
â¼
[5] Workspaces & Environments (1 wk)
Kubernetes Quick Reference
| Resource | Purpose | Example |
|---|---|---|
| Pod | Smallest unit | Single container |
| Deployment | Manage replicas | Web app |
| Service | Network access | ClusterIP, LoadBalancer |
| Ingress | HTTP routing | Path-based routing |
| ConfigMap | Configuration | Environment variables |
| Secret | Sensitive data | Credentials |
| StatefulSet | Stateful apps | Databases |
Terraform Structure
project/
âââ main.tf # Resources
âââ variables.tf # Inputs
âââ outputs.tf # Outputs
âââ providers.tf # Provider config
âââ versions.tf # Version constraints
âââ modules/
â âââ vpc/
â âââ eks/
â âââ rds/
âââ environments/
âââ dev.tfvars
âââ staging.tfvars
âââ prod.tfvars
CI/CD Pipeline Template
# GitHub Actions
name: CI/CD
on:
push:
branches: [main]
jobs:
build-test-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: docker build -t app .
- name: Test
run: docker run app pytest
- name: Push
run: docker push registry/app:${{ github.sha }}
- name: Deploy
if: github.ref == 'refs/heads/main'
run: kubectl set image deployment/app app=registry/app:${{ github.sha }}
Monitoring Stack
âââââââââââââââââââââââââââââââââââââââââââ
â OBSERVABILITY STACK â
âââââââââââââââââââââââââââââââââââââââââââ¤
â Metrics: Prometheus â Grafana â
â Logs: Loki / ELK â
â Traces: Jaeger / Tempo â
â Alerts: Alertmanager â PagerDuty â
âââââââââââââââââââââââââââââââââââââââââââ
Troubleshooting
Container not starting?
ââ⺠docker logs <container>
ââ⺠Check port conflicts
ââ⺠Check image name/tag
ââ⺠Check resource limits
Pod in CrashLoopBackOff?
ââ⺠kubectl describe pod <name>
ââ⺠kubectl logs <pod>
ââ⺠Check resource limits
ââ⺠Check probes configuration
ââ⺠Check image pull secrets
Terraform apply fails?
ââ⺠terraform plan first
ââ⺠Check state lock
ââ⺠terraform import existing
ââ⺠Restore state from backup
High cloud bill?
ââ⺠Enable cost alerts
ââ⺠Right-size instances
ââ⺠Use spot instances
ââ⺠Delete unused resources
ââ⺠Storage lifecycle policies
Common Failure Modes
| Symptom | Root Cause | Recovery |
|---|---|---|
| Pod CrashLoopBackOff | App error or OOM | Check logs, increase limits |
| ImagePullBackOff | Wrong image or auth | Verify image, check secrets |
| Terraform drift | Manual changes | Import or terraform apply |
| Slow deploys | Large images | Multi-stage builds, layer caching |
Best Practices
Docker
- Use multi-stage builds
- Run as non-root user
- Use .dockerignore
- Pin base image versions
- Scan for vulnerabilities
Kubernetes
- Set resource requests/limits
- Use readiness/liveness probes
- Store config in ConfigMaps
- Use namespaces for isolation
- Enable network policies
Terraform
- Use remote state (S3, GCS)
- Lock state file
- Use modules for reuse
- Plan before apply
- Tag all resources
Next Actions
Specify your cloud platform and focus area for detailed guidance.