aws-cost-finops
npx skills add https://github.com/ahmedasmar/devops-claude-skills --skill aws-cost-finops
Agent 安装分布
Skill 文档
AWS Cost Optimization & FinOps
Systematic workflows for AWS cost optimization and financial operations management.
When to Use This Skill
Use this skill when you need to:
- Find cost savings: Identify unused resources, rightsizing opportunities, or commitment discounts
- Analyze spending: Understand cost trends, detect anomalies, or break down costs
- Optimize architecture: Choose cost-effective services, storage tiers, or instance types
- Implement FinOps: Set up governance, tagging, budgets, or monthly reviews
- Make purchase decisions: Evaluate Reserved Instances, Savings Plans, or Spot instances
- Troubleshoot costs: Investigate unexpected bills or cost spikes
- Plan budgets: Forecast costs or evaluate impact of new projects
Cost Optimization Workflow
Follow this systematic approach for AWS cost optimization:
âââââââââââââââââââââââââââââââââââââââââââââââ
â 1. DISCOVER â
â What are we spending money on? â
â Run: find_unused_resources.py â
â Run: cost_anomaly_detector.py â
âââââââââââââââââââââââââââââââââââââââââââââââ
â
âââââââââââââââââââââââââââââââââââââââââââââââ
â 2. ANALYZE â
â Where are the optimization opportunities?â
â Run: rightsizing_analyzer.py â
â Run: detect_old_generations.py â
â Run: spot_recommendations.py â
â Run: analyze_ri_recommendations.py â
âââââââââââââââââââââââââââââââââââââââââââââââ
â
âââââââââââââââââââââââââââââââââââââââââââââââ
â 3. PRIORITIZE â
â What should we optimize first? â
â - Quick wins (low risk, high savings) â
â - Low-hanging fruit (easy to implement) â
â - Strategic improvements â
âââââââââââââââââââââââââââââââââââââââââââââââ
â
âââââââââââââââââââââââââââââââââââââââââââââââ
â 4. IMPLEMENT â
â Execute optimization actions â
â - Delete unused resources â
â - Rightsize instances â
â - Purchase commitments â
â - Migrate to new generations â
âââââââââââââââââââââââââââââââââââââââââââââââ
â
âââââââââââââââââââââââââââââââââââââââââââââââ
â 5. MONITOR â
â Verify savings and track metrics â
â - Monthly cost reviews â
â - Tag compliance monitoring â
â - Budget variance tracking â
âââââââââââââââââââââââââââââââââââââââââââââââ
Core Workflows
Workflow 1: Monthly Cost Optimization Review
Frequency: Run monthly (first week of each month)
Step 1: Find Unused Resources
# Scan for waste across all resources
python3 scripts/find_unused_resources.py
# Expected output:
# - Unattached EBS volumes
# - Old snapshots
# - Unused Elastic IPs
# - Idle NAT Gateways
# - Idle EC2 instances
# - Unused load balancers
# - Estimated monthly savings
Step 2: Analyze Cost Anomalies
# Detect unusual spending patterns
python3 scripts/cost_anomaly_detector.py --days 30
# Expected output:
# - Cost spikes and anomalies
# - Top cost drivers
# - Period-over-period comparison
# - 30-day forecast
Step 3: Identify Rightsizing Opportunities
# Find oversized instances
python3 scripts/rightsizing_analyzer.py --days 30
# Expected output:
# - EC2 instances with low utilization
# - RDS instances with low utilization
# - Recommended smaller instance types
# - Estimated savings
Step 4: Generate Monthly Report
# Use the template to compile findings
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md
# Fill in:
# - Findings from scripts
# - Action items
# - Team cost breakdowns
# - Optimization wins
Step 5: Team Review Meeting
- Present findings to engineering teams
- Assign optimization tasks
- Track action items to completion
Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)
When: Quarterly or when usage patterns stabilize
Step 1: Analyze Current Usage
# Identify workloads suitable for commitments
python3 scripts/analyze_ri_recommendations.py --days 60
# Looks for:
# - EC2 instances running consistently for 60+ days
# - RDS instances with stable usage
# - Calculates ROI for 1yr vs 3yr commitments
Step 2: Review Recommendations
Evaluate each recommendation:
â
Good candidate if:
- Running 24/7 for 60+ days
- Workload is stable and predictable
- No plans to change architecture
- Savings > 30%
â Poor candidate if:
- Workload is variable or experimental
- Architecture changes planned
- Instance type may change
- Dev/test environment
Step 3: Choose Commitment Type
Reserved Instances:
- Standard RI: Highest discount (63%), no flexibility
- Convertible RI: Moderate discount (54%), can change instance type
- Best for: Specific instance types, stable workloads
Savings Plans:
- Compute SP: Flexible across instance types, regions (66% savings)
- EC2 Instance SP: Flexible across sizes in same family (72% savings)
- Best for: Variable workloads within constraints
Decision Matrix:
Known instance type, won't change â Standard RI
May need to change types â Convertible RI or Compute SP
Variable workloads â Compute Savings Plan
Maximum flexibility â Compute Savings Plan
Step 4: Purchase and Track
- Purchase through AWS Console or CLI
- Tag commitments with purchase date and owner
- Monitor utilization monthly
- Aim for >90% utilization
Reference: See references/best_practices.md for detailed commitment strategies
Workflow 3: Instance Generation Migration
When: During architecture reviews or optimization sprints
Step 1: Detect Old Instances
# Find outdated instance generations
python3 scripts/detect_old_generations.py
# Identifies:
# - t2 â t3 migrations (10% savings)
# - m4 â m5 â m6i migrations
# - Intel â Graviton opportunities (20% savings)
Step 2: Prioritize Migrations
Quick Wins (Low Risk):
t2 â t3: Drop-in replacement, 10% savings
m4 â m5: Better performance, 5% savings
gp2 â gp3: No downtime, 20% savings
Medium Effort (Test Required):
x86 â Graviton (ARM64): 20% savings
- Requires ARM64 compatibility testing
- Most modern frameworks support ARM64
- Test in staging first
Step 3: Execute Migration
For EC2 (x86 to x86):
- Stop instance
- Change instance type
- Start instance
- Verify application
For Graviton Migration:
- Create ARM64 AMI or Docker image
- Launch new Graviton instance
- Test thoroughly
- Cut over traffic
- Terminate old instance
Step 4: Validate Savings
- Monitor new costs in Cost Explorer
- Verify performance is acceptable
- Document migration for other teams
Reference: See references/best_practices.md â Compute Optimization
Workflow 4: Spot Instance Evaluation
When: For fault-tolerant workloads or Auto Scaling Groups
Step 1: Identify Candidates
# Analyze workloads for Spot suitability
python3 scripts/spot_recommendations.py
# Evaluates:
# - Instances in Auto Scaling Groups (good candidates)
# - Dev/test/staging environments
# - Batch processing workloads
# - CI/CD and build servers
Step 2: Assess Suitability
Excellent for Spot:
- Stateless applications
- Batch jobs
- CI/CD pipelines
- Data processing
- Auto Scaling Groups
NOT suitable for Spot:
- Databases (without replicas)
- Stateful applications
- Real-time services
- Mission-critical workloads
Step 3: Implementation Strategy
Option 1: Fargate Spot (Easiest)
# ECS task definition
requiresCompatibilities:
- FARGATE
capacityProviderStrategy:
- capacityProvider: FARGATE_SPOT
weight: 70 # 70% Spot
- capacityProvider: FARGATE
weight: 30 # 30% On-Demand
Option 2: EC2 Auto Scaling with Spot
# Mixed instances policy
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 2
OnDemandPercentageAboveBaseCapacity: 30
SpotAllocationStrategy: capacity-optimized
LaunchTemplate:
Overrides:
- InstanceType: m5.large
- InstanceType: m5a.large
- InstanceType: m5n.large
Option 3: EC2 Spot Fleet
# Create Spot Fleet with diverse instance types
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json
Step 4: Implement Interruption Handling
# Handle 2-minute termination notice
# Instance metadata: /latest/meta-data/spot/instance-action
# In application:
1. Poll for termination notice
2. Gracefully shutdown (save state)
3. Drain connections
4. Exit
Reference: See references/best_practices.md â Compute Optimization â Spot Instances
Quick Reference: Cost Optimization Scripts
All Scripts Location
ls scripts/
# find_unused_resources.py
# analyze_ri_recommendations.py
# detect_old_generations.py
# spot_recommendations.py
# rightsizing_analyzer.py
# cost_anomaly_detector.py
Script Usage Patterns
Monthly Review (Run all):
python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30
Quarterly Optimization:
python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py
Specific Region Only:
python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2
Named AWS Profile:
python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60
Script Requirements
# Install dependencies
pip install boto3 tabulate
# AWS credentials required
# Configure via: aws configure
# Or use: --profile PROFILE_NAME
Service-Specific Optimization
Compute Optimization
Key Actions:
- Migrate to Graviton (20% savings)
- Use Spot for fault-tolerant workloads (70% savings)
- Purchase RIs for stable workloads (40-65% savings)
- Right-size oversized instances
Reference: references/best_practices.md â Compute Optimization
Storage Optimization
Key Actions:
- Convert gp2 â gp3 (20% savings)
- Implement S3 lifecycle policies (50-95% savings)
- Delete old snapshots
- Use S3 Intelligent-Tiering
Reference: references/best_practices.md â Storage Optimization
Network Optimization
Key Actions:
- Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
- Use CloudFront to reduce data transfer costs
- Colocate resources in same AZ when possible
Reference: references/best_practices.md â Network Optimization
Database Optimization
Key Actions:
- Right-size RDS instances
- Use gp3 storage (20% cheaper than gp2)
- Evaluate Aurora Serverless for variable workloads
- Purchase RDS Reserved Instances
Reference: references/best_practices.md â Database Optimization
Service Alternatives Decision Guide
Need help choosing between services?
Question: “Should I use EC2, Lambda, or Fargate?”
Answer: See references/service_alternatives.md â Compute Alternatives
Question: “Which S3 storage class should I use?”
Answer: See references/service_alternatives.md â Storage Alternatives
Question: “Should I use RDS or Aurora?”
Answer: See references/service_alternatives.md â Database Alternatives
Question: “NAT Gateway vs VPC Endpoint vs NAT Instance?”
Answer: See references/service_alternatives.md â Networking Alternatives
FinOps Governance & Process
Setting Up FinOps
Phase 1: Foundation (Month 1)
- Enable Cost Explorer
- Set up AWS Budgets
- Define tagging strategy
- Activate cost allocation tags
Phase 2: Visibility (Months 2-3)
- Implement tagging enforcement
- Run optimization scripts
- Set up monthly reviews
- Create team cost reports
Phase 3: Culture (Ongoing)
- Cost metrics in engineering KPIs
- Cost review in architecture decisions
- Regular optimization sprints
- FinOps champions in each team
Full Guide: See references/finops_governance.md
Monthly Review Process
Week 1: Data Collection
- Run all optimization scripts
- Export Cost & Usage Reports
- Compile findings
Week 2: Analysis
- Identify trends
- Find opportunities
- Prioritize actions
Week 3: Team Reviews
- Present to engineering teams
- Discuss optimizations
- Assign action items
Week 4: Executive Reporting
- Create executive summary
- Forecast next quarter
- Report optimization wins
Template: See assets/templates/monthly_cost_report.md
Detailed Process: See references/finops_governance.md â Monthly Review Process
Cost Optimization Checklist
Quick Wins (Do First)
- Delete unattached EBS volumes
- Delete old EBS snapshots (>90 days)
- Release unused Elastic IPs
- Convert gp2 â gp3 volumes
- Stop/terminate idle EC2 instances
- Enable S3 Intelligent-Tiering
- Set up AWS Budgets and alerts
Medium Effort (This Quarter)
- Right-size oversized instances
- Migrate to newer instance generations
- Purchase Reserved Instances for stable workloads
- Implement S3 lifecycle policies
- Replace NAT Gateways with VPC Endpoints (where applicable)
- Enable automated resource scheduling (dev/test)
- Implement tagging strategy and enforcement
Strategic Initiatives (Ongoing)
- Migrate to Graviton instances
- Implement Spot for fault-tolerant workloads
- Establish monthly cost review process
- Set up cost allocation by team
- Implement chargeback/showback model
- Create FinOps culture and practices
Troubleshooting Cost Issues
“My bill suddenly increased”
-
Run cost anomaly detection:
python3 scripts/cost_anomaly_detector.py --days 30 -
Check Cost Explorer for service breakdown
-
Review CloudTrail for resource creation events
-
Check for AutoScaling events
-
Verify no Reserved Instances expired
“I need to reduce costs by X%”
Follow the optimization workflow:
- Run all discovery scripts
- Calculate total potential savings
- Prioritize by: Savings Amount à (1 / Effort)
- Focus on quick wins first
- Implement strategic changes for long-term
“How do I know if Reserved Instances make sense?”
Run RI analysis:
python3 scripts/analyze_ri_recommendations.py --days 60
Look for:
- Instances running 60+ days consistently
- Workloads that won’t change
- Savings > 30%
“Which resources can I safely delete?”
Run unused resource finder:
python3 scripts/find_unused_resources.py
Safe to delete (usually):
- Unattached EBS volumes (after verifying)
- Snapshots > 90 days (if backups exist elsewhere)
- Unused Elastic IPs (after verifying not in DNS)
- Stopped EC2 instances > 30 days (after confirming abandoned)
Always verify with resource owner before deletion!
Best Practices Summary
- Tag Everything: Consistent tagging enables cost allocation and accountability
- Monitor Continuously: Weekly script runs catch waste early
- Review Monthly: Regular reviews prevent cost drift
- Right-size Proactively: Don’t wait for cost issues to optimize
- Use Commitments Wisely: RIs/SPs for stable workloads only
- Test Before Migrating: Especially for Graviton or Spot
- Automate Cleanup: Scheduled shutdown of dev/test resources
- Share Wins: Celebrate cost savings to build FinOps culture
Additional Resources
Detailed References:
references/best_practices.md: Comprehensive optimization strategiesreferences/service_alternatives.md: Cost-effective service selectionreferences/finops_governance.md: Organizational FinOps practices
Templates:
assets/templates/monthly_cost_report.md: Monthly reporting template
Scripts:
- All scripts in
scripts/directory with--helpfor usage
AWS Documentation:
- AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
- AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
- FinOps Foundation: https://www.finops.org