senior-cloud-architect

📁 borghei/claude-skills 📅 Jan 24, 2026
40
总安装量
39
周安装量
#9438
全站排名
安装命令
npx skills add https://github.com/borghei/claude-skills --skill senior-cloud-architect

Agent 安装分布

opencode 28
gemini-cli 26
claude-code 25
cursor 24
codex 24
github-copilot 20

Skill 文档

Senior Cloud Architect

Expert-level cloud architecture and infrastructure design.

Core Competencies

  • Multi-cloud architecture
  • AWS, GCP, Azure platforms
  • Cloud-native design patterns
  • Cost optimization
  • Security and compliance
  • Migration strategies
  • Disaster recovery
  • Infrastructure automation

Cloud Platform Comparison

Service AWS GCP Azure
Compute EC2, ECS, EKS GCE, GKE VMs, AKS
Serverless Lambda Cloud Functions Azure Functions
Storage S3 Cloud Storage Blob Storage
Database RDS, DynamoDB Cloud SQL, Spanner SQL DB, CosmosDB
ML SageMaker Vertex AI Azure ML
CDN CloudFront Cloud CDN Azure CDN

AWS Architecture

Well-Architected Framework

Pillars:

  1. Operational Excellence

    • Infrastructure as Code
    • Monitoring and observability
    • Incident response
    • Continuous improvement
  2. Security

    • Identity and access management
    • Data protection
    • Infrastructure protection
    • Incident response
  3. Reliability

    • Fault tolerance
    • Disaster recovery
    • Change management
    • Failure testing
  4. Performance Efficiency

    • Right-sizing resources
    • Monitoring performance
    • Trade-off decisions
    • Keeping current
  5. Cost Optimization

    • Cost awareness
    • Right-sizing
    • Reserved capacity
    • Efficient resources
  6. Sustainability

    • Region selection
    • Efficient algorithms
    • Hardware utilization
    • Data management

Reference Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Route 53 (DNS)                       │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                    CloudFront (CDN)                         │
│                    WAF (Web Application Firewall)           │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                Application Load Balancer                     │
└──────────┬───────────────────────────────────┬──────────────┘
           │                                   │
┌──────────▼──────────┐             ┌──────────▼──────────┐
│   ECS/EKS Cluster   │             │   ECS/EKS Cluster   │
│   (AZ-a)            │             │   (AZ-b)            │
└──────────┬──────────┘             └──────────┬──────────┘
           │                                   │
┌──────────▼───────────────────────────────────▼──────────┐
│                    ElastiCache (Redis)                   │
└─────────────────────────────┬───────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────┐
│                    RDS Multi-AZ                          │
│                    (Primary + Standby)                   │
└─────────────────────────────────────────────────────────┘

Terraform AWS Module

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.environment}"
  cidr = var.vpc_cidr

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"
  enable_dns_hostnames   = true
  enable_dns_support     = true

  tags = local.common_tags
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "${var.project}-${var.environment}"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  eks_managed_node_groups = {
    main = {
      instance_types = var.node_instance_types
      min_size       = var.node_min_size
      max_size       = var.node_max_size
      desired_size   = var.node_desired_size
    }
  }

  tags = local.common_tags
}

module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "${var.project}-${var.environment}"

  engine               = "postgres"
  engine_version       = "15"
  family               = "postgres15"
  major_engine_version = "15"
  instance_class       = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_max_allocated_storage

  db_name  = var.db_name
  username = var.db_username
  port     = 5432

  multi_az               = var.environment == "production"
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [module.security_group.security_group_id]

  backup_retention_period = var.environment == "production" ? 30 : 7
  skip_final_snapshot     = var.environment != "production"

  tags = local.common_tags
}

Cost Optimization

Reserved vs On-Demand vs Spot

Type Discount Commitment Use Case
On-Demand 0% None Variable workloads
Reserved 30-72% 1-3 years Steady-state
Savings Plans 30-72% 1-3 years Flexible compute
Spot 60-90% None Fault-tolerant

Cost Optimization Strategies

Right-sizing:

def analyze_utilization(instance_id: str, days: int = 14):
    """Analyze CPU/memory utilization for right-sizing recommendations."""
    cloudwatch = boto3.client('cloudwatch')

    metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=datetime.now() - timedelta(days=days),
        EndTime=datetime.now(),
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
    max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])

    if avg_cpu < 10 and max_cpu < 30:
        return 'downsize'
    elif avg_cpu > 80:
        return 'upsize'
    else:
        return 'optimal'

Cost Allocation Tags:

required_tags:
  - Environment: production|staging|development
  - Project: project-name
  - Owner: team-name
  - CostCenter: cost-center-id

automation:
  - Untagged resources alert after 24 hours
  - Auto-terminate development resources after 7 days
  - Weekly cost reports by tag

Cost Dashboard

┌─────────────────────────────────────────────────────────────┐
│                    Monthly Cost Summary                      │
├─────────────────────────────────────────────────────────────┤
│  Total: $45,231     vs Last Month: +5%                      │
│                                                              │
│  By Service:                    By Environment:              │
│  ├── EC2: $18,500 (41%)        ├── Production: $38,000      │
│  ├── RDS: $12,000 (27%)        ├── Staging: $4,500          │
│  ├── S3: $3,200 (7%)           └── Development: $2,731      │
│  ├── Lambda: $1,800 (4%)                                     │
│  └── Other: $9,731 (21%)       Savings Opportunity: $8,200   │
│                                                              │
│  Recommendations:                                            │
│  • Convert 12 instances to Reserved (save $4,200/mo)        │
│  • Delete 5 unused EBS volumes (save $180/mo)               │
│  • Resize 8 over-provisioned instances (save $1,800/mo)     │
└─────────────────────────────────────────────────────────────┘

Disaster Recovery

DR Strategies

Strategy RTO RPO Cost
Backup & Restore Hours Hours $
Pilot Light Minutes Minutes $$
Warm Standby Minutes Seconds $$$
Multi-Site Active Seconds Near-zero $$$$

Multi-Region Architecture

┌────────────────────────────────────────────────────────────┐
│                      Global Load Balancer                   │
│                      (Route 53 / Cloud DNS)                 │
└──────────────┬─────────────────────────────┬───────────────┘
               │                             │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│      Primary Region         │ │     Secondary Region       │
│      (us-east-1)           │ │     (us-west-2)           │
│                            │ │                            │
│  ┌──────────────────────┐  │ │  ┌──────────────────────┐  │
│  │   Application Layer  │  │ │  │   Application Layer  │  │
│  │   (Active)          │  │ │  │   (Standby/Active)   │  │
│  └──────────┬───────────┘  │ │  └──────────┬───────────┘  │
│             │              │ │             │              │
│  ┌──────────▼───────────┐  │ │  ┌──────────▼───────────┐  │
│  │   Database           │──┼─┼──│   Database           │  │
│  │   (Primary)         │  │ │  │   (Read Replica)     │  │
│  └──────────────────────┘  │ │  └──────────────────────┘  │
└────────────────────────────┘ └────────────────────────────┘
                    │
                    │ Cross-Region Replication
                    ▼
        ┌──────────────────────┐
        │     S3 Backup        │
        │   (Multi-Region)     │
        └──────────────────────┘

Backup Strategy

backup_policy:
  database:
    frequency: continuous
    retention: 35 days
    cross_region: true
    encryption: aws/rds

  application_data:
    frequency: daily
    retention: 90 days
    versioning: enabled
    lifecycle:
      - transition_to_ia: 30 days
      - transition_to_glacier: 90 days
      - expiration: 365 days

  configuration:
    frequency: on_change
    retention: unlimited
    storage: git + s3

Security Architecture

Network Security

┌─────────────────────────────────────────────────────────────┐
│                           VPC                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    Public Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   NAT GW    │  │     ALB     │  │   Bastion     │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Private Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   App Tier  │  │   App Tier  │  │   App Tier    │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Data Subnet                          │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │     RDS     │  │    Redis    │  │  Elasticsearch│  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

IAM Best Practices

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LeastPrivilegeExample",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/uploads/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "engineering"
        },
        "IpAddress": {
          "aws:SourceIp": ["10.0.0.0/8"]
        }
      }
    }
  ]
}

Reference Materials

  • references/aws_patterns.md – AWS architecture patterns
  • references/gcp_patterns.md – GCP architecture patterns
  • references/multi_cloud.md – Multi-cloud strategies
  • references/cost_optimization.md – Cost optimization guide

Scripts

# Infrastructure cost analyzer
python scripts/cost_analyzer.py --account production --period monthly

# DR validation
python scripts/dr_test.py --region us-west-2 --type failover

# Security audit
python scripts/security_audit.py --framework cis --output report.html

# Resource inventory
python scripts/inventory.py --accounts all --format csv