aws-cost-optimization

📁 rameshvr/skills 📅 Feb 9, 2026

总安装量

周安装量

#61774

全站排名

安装命令

npx skills add https://github.com/rameshvr/skills --skill aws-cost-optimization

Agent 安装分布

gemini-cli 3

github-copilot 3

codex 3

kimi-cli 3

opencode 3

amp 3

Skill 文档

AWS Cost Optimization

Expert guidance for analyzing, optimizing, and managing AWS costs through proven strategies including Reserved Instances, right-sizing, resource cleanup, and cost monitoring.

When to Use

Use this skill when:

Analyzing AWS cost trends and identifying major cost drivers
Implementing cost optimization strategies to reduce cloud spend
Setting up cost monitoring, budgets, and alerts
Evaluating Reserved Instances, Savings Plans, or Spot Instances
Right-sizing over-provisioned resources
Cleaning up unused or idle resources
Optimizing storage costs (S3, EBS, snapshots)
Reducing data transfer costs
Preparing for FinOps reviews or cost optimization audits
User asks “how can I reduce my AWS bill?” or “why is AWS so expensive?”

Core Principle

Systematic cost analysis and continuous optimization ensure you pay only for what you need while maintaining performance and reliability.

Cost optimization is not a one-time activity but an ongoing practice of understanding usage patterns, eliminating waste, and aligning resources with actual demand.

The Cost Optimization Framework

CRITICAL: You MUST follow the five-step methodology systematically. Skipping steps leads to missed savings opportunities and misaligned optimizations.

digraph cost_optimization {
    "Cost optimization needed" [shape=doublecircle];
    "1. Identify cost drivers" [shape=box];
    "2. Analyze usage patterns" [shape=box];
    "3. Recommend optimizations" [shape=box];
    "4. Estimate savings" [shape=box];
    "5. Implement changes" [shape=box];
    "Verify savings realized" [shape=box];
    "Complete" [shape=doublecircle];

    "Cost optimization needed" -> "1. Identify cost drivers";
    "1. Identify cost drivers" -> "2. Analyze usage patterns";
    "2. Analyze usage patterns" -> "3. Recommend optimizations";
    "3. Recommend optimizations" -> "4. Estimate savings";
    "4. Estimate savings" -> "5. Implement changes";
    "5. Implement changes" -> "Verify savings realized";
    "Verify savings realized" -> "Complete";
}

Red Flags – You’re Skipping the Framework:

Making optimization recommendations before analyzing usage patterns
Suggesting Reserved Instances without examining actual utilization
Recommending service changes without understanding workload requirements
Providing generic advice instead of account-specific analysis
Skipping the savings estimation step

Step 1: Identify Cost Drivers

Use AWS Cost Explorer

Key questions to answer:

What are the top 5 services by spend?
Which accounts/projects are the highest spenders?
What’s the month-over-month cost trend?
Are there any unexpected cost spikes?

Cost Analysis Tools

Tool	Purpose	When to Use
AWS Cost Explorer	Visualize and analyze spending patterns	Daily/monthly cost reviews
AWS Cost and Usage Reports	Detailed billing data for analysis	Deep-dive investigations
AWS Budgets	Set spending limits and alerts	Proactive cost control
AWS Cost Anomaly Detection	Identify unusual spending patterns	Catch unexpected cost increases
AWS Compute Optimizer	Right-sizing recommendations	EC2, Lambda, EBS optimization
AWS Trusted Advisor	Best practice recommendations	Regular health checks
Third-party tools	CloudHealth, CloudCheckr, Spot.io	Advanced FinOps workflows

Cost Allocation Strategy

Implement comprehensive tagging:

// CDK Example: Consistent tagging strategy
import * as cdk from 'aws-cdk-lib';

const commonTags = {
  Environment: 'production',
  Project: 'web-app',
  CostCenter: 'engineering',
  Owner: 'team-platform',
  Application: 'api-service',
};

// Apply tags to all resources in the stack
cdk.Tags.of(stack).add('Environment', commonTags.Environment);
cdk.Tags.of(stack).add('Project', commonTags.Project);
cdk.Tags.of(stack).add('CostCenter', commonTags.CostCenter);
cdk.Tags.of(stack).add('Owner', commonTags.Owner);

Essential tags for cost allocation:

Environment (production, staging, dev)
Project (project or product name)
CostCenter (department or team)
Owner (responsible team or person)
Application (specific application or service)

Step 2: Analyze Usage Patterns

Compute Utilization Analysis

EC2 Instances:

# Use AWS CLI to analyze EC2 utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --period 3600 \
  --statistics Average

Key metrics to analyze:

CPU Utilization: < 40% average = over-provisioned
Memory Utilization: < 40% average = over-provisioned
Network I/O: Consistently low = potential for smaller instance
Disk I/O: Low IOPS usage = EBS optimization opportunity

Storage Utilization Analysis

S3 Storage Classes:

Identify data access patterns (last accessed date)
Determine appropriate storage tiers
Calculate potential savings from lifecycle policies

EBS Volumes:

Find unattached volumes (paying for unused storage)
Identify volumes with low IOPS (gp2 â gp3 migration)
Check snapshot retention policies

Database Utilization

RDS/Aurora:

CPU and connection utilization
Read/write operation patterns
Multi-AZ requirements
Backup retention policies

DynamoDB:

Provisioned vs actual throughput
Read/write capacity utilization
On-demand vs provisioned comparison

Step 3: Service-Specific Optimization Strategies

Compute: EC2 Optimization

1. Right-Sizing

Process:

Analyze CloudWatch metrics (CPU, memory, network, disk)
Identify consistently under-utilized instances (< 40% avg utilization)
Use AWS Compute Optimizer recommendations
Test smaller instance types in non-production first

Example findings:

Instance: i-abc123 (m5.2xlarge)
- Average CPU: 15%
- Average Memory: 25%
- Recommendation: Downgrade to m5.large
- Monthly Savings: $200

2. Reserved Instances (RI)

When to use:

Steady-state workloads running 24/7
Predictable capacity needs for 1-3 years
Potential savings: 30-75% vs On-Demand

RI Purchase Strategy:

1. Analyze last 3-6 months of EC2 usage
2. Identify instances running consistently
3. Start with 1-year Standard RIs for stability
4. Use Convertible RIs if flexibility needed
5. Purchase incrementally, not all upfront

RI Types:

Type	Discount	Flexibility	Use Case
Standard	Up to 75%	Low	Stable workloads, known instance types
Convertible	Up to 54%	High	Workloads that may need instance type changes
Scheduled	Up to 10%	Scheduled	Predictable recurring usage patterns

3. Savings Plans

Compute Savings Plans:

Flexible across instance family, size, region
Apply to EC2, Lambda, Fargate
1-year or 3-year commitment
Savings: 30-70%

EC2 Instance Savings Plans:

Tied to specific instance family in a region
Higher discount than Compute Savings Plans
Less flexibility but more savings

Comparison:

Scenario: $10,000/month On-Demand spend

Option 1: Reserved Instances (3-year Standard)
- Discount: 65%
- Monthly Cost: $3,500
- Annual Savings: $78,000

Option 2: Compute Savings Plans (1-year)
- Discount: 40%
- Monthly Cost: $6,000
- Annual Savings: $48,000
- Flexibility: High (can change instance types)

4. Spot Instances

When to use:

Fault-tolerant workloads
Batch processing jobs
CI/CD pipelines
Containerized applications with auto-scaling
Potential savings: 50-90% vs On-Demand

Never use Spot for:

Databases (unless using spot-friendly patterns)
Stateful applications without proper handling
Critical production workloads without fallback

Implementation example:

// CDK: Auto Scaling Group with Spot instances
import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';
import * as ec2 from 'aws-cdk-lib/aws-ec2';

const asg = new autoscaling.AutoScalingGroup(this, 'ASG', {
  vpc,
  instanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM),
  machineImage: ec2.MachineImage.latestAmazonLinux2(),
  minCapacity: 2,
  maxCapacity: 10,
  spotPrice: '0.05', // Max price per hour
});

// Mix of On-Demand and Spot
const mixedInstancesPolicy = {
  instancesDistribution: {
    onDemandBaseCapacity: 2, // Minimum On-Demand instances
    onDemandPercentageAboveBaseCapacity: 20, // 20% On-Demand, 80% Spot
    spotAllocationStrategy: 'lowest-price',
  },
};

5. Stop/Start Schedules

Non-production environments:

# Lambda function to stop instances on schedule
import boto3
from datetime import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')

    # Stop dev/test instances outside business hours
    # Monday-Friday: 9 AM - 6 PM
    current_time = datetime.now()

    if current_time.weekday() >= 5:  # Weekend
        # Stop all dev instances
        filters = [{'Name': 'tag:Environment', 'Values': ['dev', 'test']}]
        instances = ec2.describe_instances(Filters=filters)

        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                if instance['State']['Name'] == 'running':
                    ec2.stop_instances(InstanceIds=[instance['InstanceId']])
                    print(f"Stopped {instance['InstanceId']}")

Potential savings:

Run dev/test 8 hours/day, 5 days/week instead of 24/7
Savings: 76% on non-production compute costs

Storage: S3 Optimization

1. Storage Class Selection

Storage Class	Use Case	Cost (relative)	Retrieval Cost
S3 Standard	Frequently accessed data	100%	None
S3 Intelligent-Tiering	Unknown or changing access patterns	Auto-optimized	Small monitoring fee
S3 Standard-IA	Infrequent access (< 1x/month)	50%	Per GB
S3 One Zone-IA	Infrequent, recreatable data	40%	Per GB
S3 Glacier Instant Retrieval	Archive, instant access	35%	Per GB
S3 Glacier Flexible Retrieval	Archive, minutes-hours retrieval	20%	Per GB
S3 Glacier Deep Archive	Long-term archive, 12+ hours retrieval	10%	Per GB

2. Lifecycle Policies

Example policy:

// CDK: S3 lifecycle rules
import * as s3 from 'aws-cdk-lib/aws-s3';

const bucket = new s3.Bucket(this, 'DataBucket', {
  lifecycleRules: [
    {
      id: 'TransitionToIA',
      enabled: true,
      transitions: [
        {
          storageClass: s3.StorageClass.INFREQUENT_ACCESS,
          transitionAfter: cdk.Duration.days(30),
        },
        {
          storageClass: s3.StorageClass.GLACIER,
          transitionAfter: cdk.Duration.days(90),
        },
        {
          storageClass: s3.StorageClass.DEEP_ARCHIVE,
          transitionAfter: cdk.Duration.days(365),
        },
      ],
    },
    {
      id: 'DeleteOldVersions',
      enabled: true,
      noncurrentVersionExpiration: cdk.Duration.days(30),
    },
    {
      id: 'CleanupIncompleteUploads',
      enabled: true,
      abortIncompleteMultipartUploadAfter: cdk.Duration.days(7),
    },
  ],
});

Typical savings:

30 days â Standard-IA: 46% savings
90 days â Glacier: 80% savings
365 days â Deep Archive: 90% savings

3. S3 Intelligent-Tiering

When to use:

Unpredictable access patterns
Data with varying access frequency
Want automatic optimization without manual policies

How it works:

Monitors access patterns
Automatically moves data between tiers:
- Frequent Access (< 30 days since last access)
- Infrequent Access (30-90 days)
- Archive Instant Access (90-180 days)
- Archive Access (180+ days)
- Deep Archive Access (180+ days, optional)

Cost: Small monthly monitoring fee ($0.0025 per 1,000 objects)

4. Delete Unnecessary Data

Common waste sources:

# Find incomplete multipart uploads
aws s3api list-multipart-uploads --bucket my-bucket

# Find old versions (if versioning enabled)
aws s3api list-object-versions --bucket my-bucket

# Analyze bucket contents by storage class
aws s3api list-objects-v2 --bucket my-bucket \
  --query "Contents[?StorageClass=='STANDARD'].{Key:Key,Size:Size}" \
  --output table

Storage: EBS Optimization

1. Delete Unattached Volumes

Find unattached volumes:

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query "Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType}" \
  --output table

Typical savings:

Average unattached volume cost: $10-50/month each
Large environments can have 50-100+ unattached volumes
Potential savings: $500-5,000/month

2. Upgrade gp2 to gp3

gp3 advantages:

20% cheaper than gp2 for same storage
Baseline performance: 3,000 IOPS, 125 MB/s throughput
Can provision additional IOPS/throughput independently

Cost comparison (1TB volume):

gp2: $100/month
gp3: $80/month (same performance)
Savings: $20/month per 1TB volume

Migration:

# Modify volume type from gp2 to gp3
aws ec2 modify-volume \
  --volume-id vol-1234567890abcdef0 \
  --volume-type gp3

3. Snapshot Management

Cleanup old snapshots:

# Delete snapshots older than retention period
import boto3
from datetime import datetime, timedelta

ec2 = boto3.client('ec2')
retention_days = 30

snapshots = ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']

for snapshot in snapshots:
    start_time = snapshot['StartTime'].replace(tzinfo=None)
    age = (datetime.now() - start_time).days

    if age > retention_days:
        ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
        print(f"Deleted snapshot {snapshot['SnapshotId']} ({age} days old)")

Database: RDS Optimization

1. Right-Sizing

Analyze CloudWatch metrics:

CPU utilization < 40% â consider smaller instance
Free memory consistently high â reduce memory
IOPS usage â adjust storage type

2. Reserved Instances

RDS RI Strategy:

1-year or 3-year commitment
Savings: 30-69% vs On-Demand
Start with production databases
Use Convertible RIs for flexibility

3. Stop Dev/Test Databases

Automate start/stop:

# Lambda to stop RDS instances on schedule
import boto3

def lambda_handler(event, context):
    rds = boto3.client('rds')

    # Stop dev databases outside business hours
    instances = rds.describe_db_instances()

    for db in instances['DBInstances']:
        tags = rds.list_tags_for_resource(
            ResourceName=db['DBInstanceArn']
        )['TagList']

        env_tag = next((t['Value'] for t in tags if t['Key'] == 'Environment'), None)

        if env_tag in ['dev', 'test'] and db['DBInstanceStatus'] == 'available':
            rds.stop_db_instance(DBInstanceIdentifier=db['DBInstanceIdentifier'])
            print(f"Stopped {db['DBInstanceIdentifier']}")

4. Snapshot Cleanup

Automated snapshot retention:

// CDK: RDS with snapshot retention
import * as rds from 'aws-cdk-lib/aws-rds';

const database = new rds.DatabaseInstance(this, 'Database', {
  engine: rds.DatabaseInstanceEngine.postgres({
    version: rds.PostgresEngineVersion.VER_15,
  }),
  backupRetention: cdk.Duration.days(7), // Keep 7 days of automated backups
  deleteAutomatedBackups: true, // Delete backups when instance is deleted
  // ...
});

Database: DynamoDB Optimization

1. On-Demand vs Provisioned

When to use On-Demand:

Unpredictable traffic patterns
New applications with unknown load
Spiky traffic
Pay only for what you use

When to use Provisioned:

Predictable traffic
Consistent baseline load
Can save 30-60% vs On-Demand with proper capacity planning

Cost comparison:

Scenario: 10M read requests/day, 1M write requests/day

On-Demand:
- Read: $1.25 per million = $12.50
- Write: $6.25 per million = $6.25
- Total: $18.75/day = $562.50/month

Provisioned (with Auto Scaling):
- 5 RCU, 1 WCU average
- Cost: ~$250/month
- Savings: $312.50/month (55%)

2. Auto Scaling

Configure Auto Scaling for provisioned capacity:

// CDK: DynamoDB with Auto Scaling
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

const table = new dynamodb.Table(this, 'Table', {
  partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
  billingMode: dynamodb.BillingMode.PROVISIONED,
  readCapacity: 5,
  writeCapacity: 5,
});

// Enable Auto Scaling
const readScaling = table.autoScaleReadCapacity({
  minCapacity: 5,
  maxCapacity: 100,
});

readScaling.scaleOnUtilization({
  targetUtilizationPercent: 70,
});

const writeScaling = table.autoScaleWriteCapacity({
  minCapacity: 5,
  maxCapacity: 100,
});

writeScaling.scaleOnUtilization({
  targetUtilizationPercent: 70,
});

Compute: Lambda Optimization

1. Memory Optimization

Strategy:

Lambda charges for memory Ã duration
More memory = faster execution (up to a point)
Find optimal memory setting using AWS Lambda Power Tuning

Example:

Function with 512MB: 2000ms execution = 1024 MB-seconds
Function with 1024MB: 1200ms execution = 1228.8 MB-seconds

But with better CPU:
Function with 1024MB: 800ms execution = 819.2 MB-seconds
Savings: 20% despite doubling memory

2. Reduce Cold Starts

Techniques:

Use Provisioned Concurrency for latency-sensitive functions
Minimize dependencies and package size
Use Lambda SnapStart (for Java)
Keep functions warm with scheduled invocations (last resort)

3. Monitor Invocation Patterns

Cost optimization:

# Analyze Lambda invocations
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --period 86400 \
  --statistics Sum

Look for:

Unnecessary invocations (polling loops)
Retry storms (fix error handling)
Inefficient batch sizes (process more per invocation)

Monitoring: CloudWatch Optimization

1. Log Retention Policies

Default retention: Logs never expire (growing costs)

Best practice:

# Set log retention to 30 days for dev, 90 days for prod
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

Typical savings:

Reduce retention from indefinite to 30 days
Savings: 70-90% on log storage costs

2. Delete Unused Log Groups

Find empty or unused log groups:

import boto3

logs = boto3.client('logs')

log_groups = logs.describe_log_groups()['logGroups']

for group in log_groups:
    # Check if log group has recent logs
    streams = logs.describe_log_streams(
        logGroupName=group['logGroupName'],
        orderBy='LastEventTime',
        descending=True,
        limit=1
    )

    if not streams['logStreams']:
        # No streams, safe to delete
        logs.delete_log_group(logGroupName=group['logGroupName'])
        print(f"Deleted empty log group: {group['logGroupName']}")

Networking: Data Transfer Optimization

1. Use CloudFront

Benefits:

Reduce data transfer from origin
Cache at edge locations
Cheaper egress pricing than direct S3/EC2

Cost comparison (1TB egress):

Direct from EC2/S3: $90
Via CloudFront: $85 (first 10TB tier)
Savings: 5-15% + performance benefits

2. VPC Endpoints

Save on NAT Gateway costs:

// CDK: VPC endpoint for S3
import * as ec2 from 'aws-cdk-lib/aws-ec2';

const vpc = new ec2.Vpc(this, 'VPC', {
  // ...
});

// S3 Gateway Endpoint (free)
vpc.addGatewayEndpoint('S3Endpoint', {
  service: ec2.GatewayVpcEndpointAwsService.S3,
});

// DynamoDB Gateway Endpoint (free)
vpc.addGatewayEndpoint('DynamoEndpoint', {
  service: ec2.GatewayVpcEndpointAwsService.DYNAMODB,
});

Savings:

NAT Gateway: $0.045/GB processed
VPC Endpoint (Gateway): $0/GB (free for S3/DynamoDB)
Potential savings: 100% on data transfer to S3/DynamoDB

3. Same-Region Architecture

Cross-region data transfer costs:

Same region: $0.01/GB (or free within same AZ)
Cross-region: $0.02/GB
Avoid cross-region unless required for HA/DR

Step 4: Cost Monitoring & Governance

Budget Setup

AWS Budgets configuration:

// CDK: Monthly budget with alerts
import * as budgets from 'aws-cdk-lib/aws-budgets';

new budgets.CfnBudget(this, 'MonthlyBudget', {
  budget: {
    budgetName: 'monthly-aws-budget',
    budgetType: 'COST',
    timeUnit: 'MONTHLY',
    budgetLimit: {
      amount: 5000,
      unit: 'USD',
    },
    costFilters: {
      // Optional: filter by service, tag, etc.
    },
  },
  notificationsWithSubscribers: [
    {
      notification: {
        notificationType: 'ACTUAL',
        comparisonOperator: 'GREATER_THAN',
        threshold: 80, // Alert at 80%
        thresholdType: 'PERCENTAGE',
      },
      subscribers: [
        {
          subscriptionType: 'EMAIL',
          address: 'finops-team@example.com',
        },
      ],
    },
    {
      notification: {
        notificationType: 'FORECASTED',
        comparisonOperator: 'GREATER_THAN',
        threshold: 100, // Forecast alert at 100%
        thresholdType: 'PERCENTAGE',
      },
      subscribers: [
        {
          subscriptionType: 'EMAIL',
          address: 'finops-team@example.com',
        },
      ],
    },
  ],
});

Cost Anomaly Detection

Enable automatic anomaly detection:

# Create cost anomaly monitor
aws ce create-anomaly-monitor \
  --anomaly-monitor Name=AllServicesMonitor,MonitorType=DIMENSIONAL \
  --monitor-dimension=SERVICE

# Create anomaly subscription for alerts
aws ce create-anomaly-subscription \
  --anomaly-subscription file://subscription.json

subscription.json:

{
  "SubscriptionName": "DailyCostAnomalies",
  "Threshold": 100.0,
  "Frequency": "DAILY",
  "MonitorArnList": ["arn:aws:ce::123456789012:anomalymonitor/12345678-1234-1234-1234-123456789012"],
  "Subscribers": [
    {
      "Type": "EMAIL",
      "Address": "finops-team@example.com"
    }
  ]
}

Cost Allocation Tags

Tag enforcement policy:

// CDK: Require tags on all resources
import * as cdk from 'aws-cdk-lib';

class TaggedStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Enforce required tags
    const requiredTags = {
      Environment: process.env.ENVIRONMENT || 'dev',
      Project: 'my-project',
      CostCenter: 'engineering',
      Owner: 'team-platform',
    };

    Object.entries(requiredTags).forEach(([key, value]) => {
      cdk.Tags.of(this).add(key, value);
    });
  }
}

Tag governance with AWS Config:

{
  "ConfigRuleName": "required-tags",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "REQUIRED_TAGS"
  },
  "InputParameters": {
    "tag1Key": "Environment",
    "tag2Key": "Project",
    "tag3Key": "CostCenter"
  }
}

Step 5: Implementation Checklist

Use this checklist to systematically implement cost optimizations:

Compute Optimization

Analyze EC2 instance utilization (CloudWatch metrics)
Right-size under-utilized instances (< 40% CPU/memory)
Purchase Reserved Instances or Savings Plans for stable workloads
Implement Spot Instances for fault-tolerant workloads
Set up stop/start schedules for dev/test environments
Review Lambda memory settings and optimize
Enable Lambda Provisioned Concurrency only when needed
Terminate unused or idle EC2 instances

Storage Optimization

Implement S3 lifecycle policies to transition to cheaper tiers
Enable S3 Intelligent-Tiering for unknown access patterns
Delete incomplete multipart uploads
Delete old S3 object versions if versioning enabled
Find and delete unattached EBS volumes
Upgrade gp2 volumes to gp3 (20% savings)
Clean up old EBS snapshots (> retention period)
Review EBS snapshot retention policies

Database Optimization

Right-size RDS instances based on CloudWatch metrics
Purchase RDS Reserved Instances for production databases
Stop dev/test RDS instances outside business hours
Clean up old RDS snapshots
Evaluate DynamoDB On-Demand vs Provisioned capacity
Enable DynamoDB Auto Scaling for provisioned tables
Review DynamoDB read/write capacity utilization

Networking Optimization

Use CloudFront for content delivery (reduce egress costs)
Implement VPC endpoints for S3/DynamoDB (eliminate NAT costs)
Consolidate resources in same region/AZ where possible
Review data transfer patterns and optimize
Consider AWS PrivateLink for service-to-service communication

Monitoring & Governance

Apply cost allocation tags consistently across all resources
Enable AWS Cost Explorer and review monthly
Set up AWS Budgets with alerts (80% and 100% thresholds)
Enable AWS Cost Anomaly Detection
Configure CloudWatch log retention policies (30-90 days)
Delete unused CloudWatch log groups
Schedule weekly cost reviews with team
Document cost optimization wins and share with organization

Review Cadence

Weekly:

Check AWS Budgets for alerts
Review Cost Anomaly Detection findings
Verify stop/start schedules are working

Monthly:

Deep dive Cost Explorer analysis
Review top 10 cost drivers
Analyze month-over-month trends
Identify new optimization opportunities

Quarterly:

Review Reserved Instance and Savings Plan utilization
Evaluate new AWS services for cost optimization
Conduct comprehensive cost optimization audit
Update cost allocation tags strategy
Review and adjust budgets based on trends

Common Anti-Patterns

Anti-Pattern	Issue	Better Approach
No tagging strategy	Can’t allocate costs to teams/projects	Implement comprehensive tagging, enforce with policies
No budgets or alerts	Surprise bills	Set up AWS Budgets with multiple thresholds
Always On-Demand	Paying 3x more than necessary	Use Reserved Instances/Savings Plans for stable workloads
Over-provisioning	Paying for unused capacity	Right-size based on actual utilization metrics
No lifecycle policies	Paying premium storage for cold data	Implement S3 lifecycle policies to cheaper tiers
Unattached volumes	Paying for unused storage	Automate detection and deletion of unattached EBS
Dev/test running 24/7	3x cost for non-production	Stop resources outside business hours
Manual optimization	Time-consuming, error-prone	Automate with Lambda, Systems Manager, AWS Backup
No cost visibility	Can’t identify waste	Enable Cost Explorer, Cost and Usage Reports

Cost Optimization Prioritization

Use this matrix to prioritize optimization efforts:

Priority	Savings Potential	Effort	Examples
High	High savings, Low effort	Low	Delete unattached EBS volumes, stop unused instances
High	High savings, Medium effort	Medium	Purchase Reserved Instances, implement S3 lifecycle policies
Medium	Medium savings, Low effort	Low	CloudWatch log retention, right-size Lambda memory
Medium	High savings, High effort	High	Migrate to Spot Instances, re-architect for serverless
Low	Low savings, Any effort	Any	Micro-optimizations, minor instance type changes

Start with High Priority first – quick wins that demonstrate value and build momentum.

Real-World Example

Scenario: SaaS application with high AWS costs

Initial State:

Monthly AWS spend: $50,000
No Reserved Instances or Savings Plans
Many unattached EBS volumes
S3 data all in Standard storage class
Dev/test environments running 24/7
No cost allocation tags
No budgets or alerts

Cost Analysis (Step 1):

EC2: $25,000 (50%)
S3: $8,000 (16%)
RDS: $7,000 (14%)
Data Transfer: $5,000 (10%)
Other: $5,000 (10%)

Optimizations Implemented (Steps 2-3):

EC2 Optimization:
- Right-sized 30% of instances: -$3,000/month
- Purchased 1-year Compute Savings Plan (50% coverage): -$6,250/month
- Implemented stop/start for dev/test (76% reduction): -$4,500/month
- Terminated 10 unused instances: -$1,500/month
S3 Optimization:
- Implemented lifecycle policies (30d â IA, 90d â Glacier): -$3,200/month
- Deleted incomplete multipart uploads: -$400/month
- Enabled Intelligent-Tiering for unpredictable data: -$800/month
RDS Optimization:
- Right-sized 2 over-provisioned instances: -$1,200/month
- Purchased 1-year RDS RI for production: -$1,750/month
- Stop/start dev databases: -$1,400/month
EBS Optimization:
- Deleted 50 unattached volumes: -$1,000/month
- Migrated gp2 to gp3: -$800/month
- Cleaned up old snapshots: -$300/month
Networking:
- Implemented S3 VPC endpoint: -$1,200/month
- CloudFront for static assets: -$800/month
Governance:
- Implemented tagging strategy
- Set up budgets and alerts
- Enabled Cost Anomaly Detection

Results:

Total Monthly Savings: $27,100
Annual Savings: $325,200
New Monthly Cost: $22,900
Cost Reduction: 54%

Ongoing Optimization:

Weekly cost reviews catch anomalies early
Monthly analysis identifies new opportunities
Quarterly RI/SP utilization reviews
Continuous right-sizing based on CloudWatch data

Common Mistakes When Using This Skill

Mistake	Why It’s Wrong	Correct Approach
“We’ll optimize later”	Costs compound quickly, harder to change later	Start cost optimization from day one
Optimizing without analyzing usage	May hurt performance or availability	Always analyze usage patterns first, then optimize
Buying 3-year RIs without testing	Locked into potentially wrong capacity	Start with 1-year, or use Convertible RIs
No cost allocation tags	Can’t track who’s spending what	Implement comprehensive tagging from start
Only optimizing once	Costs drift as usage changes	Make cost optimization ongoing practice
Ignoring small costs	Small costs add up to significant waste	Systematically address all waste, even small items
Manual optimization only	Time-consuming, doesn’t scale	Automate detection, remediation, and monitoring

AWS Cost Optimization Tools Reference

AWS Native Tools

AWS Cost Explorer: Visualize spending patterns, analyze trends
AWS Cost and Usage Reports: Detailed billing data for analysis
AWS Budgets: Set spending limits and alerts
AWS Cost Anomaly Detection: Identify unusual spending
AWS Compute Optimizer: Right-sizing recommendations for EC2, Lambda, EBS
AWS Trusted Advisor: Best practice recommendations (requires Business/Enterprise support)
AWS Pricing Calculator: Estimate costs before deployment

Third-Party Tools

CloudHealth by VMware: Multi-cloud cost management and governance
CloudCheckr: Cost optimization and compliance
Spot.io: Automated cloud cost optimization
Kubecost: Kubernetes cost monitoring and optimization
Infracost: Infrastructure as code cost estimation

CLI Commands Cheat Sheet

# Cost Explorer: Get monthly costs
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost

# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query "Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}"

# Find unattached Elastic IPs
aws ec2 describe-addresses \
  --query "Addresses[?AssociationId==null].{IP:PublicIp,AllocationId:AllocationId}"

# List unused NAT Gateways (check metrics separately)
aws ec2 describe-nat-gateways \
  --filter Name=state,Values=available

# Find old snapshots (> 90 days)
aws ec2 describe-snapshots \
  --owner-ids self \
  --query "Snapshots[?StartTime<='$(date -u -d '90 days ago' '+%Y-%m-%d')'].{ID:SnapshotId,Created:StartTime,Size:VolumeSize}"

# Analyze S3 storage by class
aws s3api list-buckets --query "Buckets[].Name" --output text | \
  xargs -I {} aws s3api list-objects-v2 --bucket {} \
  --query "Contents[].{StorageClass:StorageClass}" --output text | \
  sort | uniq -c

# Get Lambda function costs (via CloudWatch metrics)
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '30 days ago' '+%Y-%m-%dT%H:%M:%S') \
  --end-time $(date -u '+%Y-%m-%dT%H:%M:%S') \
  --period 2592000 \
  --statistics Sum

Resources

Using This Skill

When performing cost optimization:

Start with data – analyze actual spending patterns before recommending changes
Prioritize by impact – focus on high-savings, low-effort opportunities first
Consider trade-offs – balance cost vs performance, availability, and complexity
Be specific – provide exact AWS service names, CLI commands, and code examples
Estimate savings – quantify expected savings for each recommendation
Implement incrementally – test changes in dev/test before production
Monitor results – verify savings are realized after implementation
Automate – use Lambda, Systems Manager, and AWS Backup for routine tasks
Establish governance – implement tagging, budgets, and regular reviews
Make it ongoing – cost optimization is continuous, not one-time

Remember: The goal is not just to reduce costs, but to eliminate waste while maintaining or improving performance, reliability, and security.

Critical: Always analyze usage patterns before making changes. The most expensive mistake is optimizing based on assumptions rather than data.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台