kubernetes-specialist
36
总安装量
36
周安装量
#5765
全站排名
安装命令
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill kubernetes-specialist
Agent 安装分布
opencode
27
claude-code
26
gemini-cli
23
cursor
21
github-copilot
15
Skill 文档
Kubernetes Specialist
Purpose
Provides expert Kubernetes orchestration and cloud-native application expertise with deep knowledge of container orchestration, cluster management, and production-grade deployments. Specializes in Kubernetes architecture, Helm charts, operators, multi-cluster management, and GitOps workflows across EKS, AKS, GKE, and on-premises deployments.
When to Use
- Designing Kubernetes cluster architecture for production workloads
- Implementing Helm charts, operators, or GitOps workflows (ArgoCD, Flux)
- Troubleshooting cluster issues (networking, storage, performance)
- Planning Kubernetes upgrades or multi-cluster strategies
- Optimizing resource utilization and cost in Kubernetes environments
- Setting up service mesh (Istio, Linkerd) and observability
- Implementing Kubernetes security and RBAC policies
Quick Start
Invoke this skill when:
- Designing Kubernetes cluster architecture for production workloads
- Implementing Helm charts, operators, or GitOps workflows
- Troubleshooting cluster issues (networking, storage, performance)
- Planning Kubernetes upgrades or multi-cluster strategies
- Optimizing resource utilization and cost in Kubernetes environments
Do NOT invoke when:
- Simple Docker container needs (use docker commands directly)
- Cloud infrastructure provisioning (use cloud-architect instead)
- Application code debugging (use backend-developer/frontend-developer)
- Database-specific issues (use database-administrator instead)
Decision Framework
Deployment Strategy Selection
ââ Zero downtime required?
â ââ Instant rollback needed â Blue-Green Deployment
â â Pros: Instant switch, easy rollback
â â Cons: 2x resources during deployment
â â
â ââ Gradual rollout â Canary Deployment
â â Pros: Test with subset of traffic
â â Cons: Complex routing setup
â â
â ââ Simple updates â Rolling Update (default)
â Pros: Built-in, no extra resources
â Cons: Rollback takes time
â
ââ Stateful application?
â ââ Database â StatefulSet + PVC
â â Pros: Stable network IDs, ordered deployment
â â Cons: Complex scaling
â â
â ââ Stateless â Deployment
â Pros: Easy scaling, self-healing
â
ââ Batch processing?
ââ One-time â Job
ââ Scheduled â CronJob
ââ Parallel processing â Job with parallelism
Resource Configuration Matrix
| Workload Type | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| Web API | 100m-500m | 1000m | 256Mi-512Mi | 1Gi |
| Worker | 500m-1000m | 2000m | 512Mi-1Gi | 2Gi |
| Database | 1000m-2000m | 4000m | 2Gi-4Gi | 8Gi |
| Cache | 100m-250m | 500m | 1Gi-4Gi | 8Gi |
| Batch Job | 500m-2000m | 4000m | 1Gi-4Gi | 8Gi |
Node Pool Strategy
| Use Case | Instance Type | Scaling | Cost |
|---|---|---|---|
| System pods | t3.large (3 nodes) | Fixed | Low |
| Applications | m5.xlarge | Auto 3-20 | Medium |
| Batch/Spot | m5.large-2xlarge | Auto 0-50 | Very Low |
| GPU workloads | p3.2xlarge | Manual | High |
Red Flags â Escalate
STOP and escalate if:
- Cluster upgrade with breaking API changes (deprecated versions)
- Multi-region active-active requirements
- Compliance requirements (PCI-DSS, HIPAA) need validation
- Custom scheduler or controller development needed
- etcd corruption or cluster state issues
Quality Checklist
Cluster Configuration
- Multi-AZ deployment (nodes spread across availability zones)
- Node autoscaling configured (Cluster Autoscaler or Karpenter)
- System node pool with taints (separate critical addons from apps)
- Encryption enabled (secrets at rest with KMS)
- Audit logging enabled (API server logs)
Security
- Pod Security Standards enforced (restricted or baseline)
- Network policies configured (default deny + explicit allow)
- RBAC configured (least privilege for all service accounts)
- Image scanning enabled (scan for vulnerabilities)
- Private container registry configured
Resource Management
- All pods have resource requests and limits
- HorizontalPodAutoscalers configured for scalable workloads
- PodDisruptionBudgets defined (prevent too many pods down)
- ResourceQuotas set per namespace
- LimitRanges defined (default limits for pods)
High Availability
- Deployments have â¥2 replicas
- Anti-affinity rules prevent pod co-location
- Readiness and liveness probes configured
- PodDisruptionBudgets allow for rolling updates
- Multi-region cluster (if global scale required)
Observability
- Metrics server installed (kubectl top works)
- Prometheus monitoring application metrics
- Centralized logging (CloudWatch, Elasticsearch, Loki)
- Distributed tracing (Jaeger, Tempo)
- Dashboards for cluster and application health
Disaster Recovery
- Velero installed for cluster backups
- Backup schedule configured (daily minimum)
- Restore tested (annual drill)
- etcd backups automated (cloud-managed clusters)
Additional Resources
- Detailed Technical Reference: See REFERENCE.md
- Code Examples & Patterns: See EXAMPLES.md