performance-analysis
8
总安装量
7
周安装量
#33949
全站排名
安装命令
npx skills add https://github.com/rsmdt/the-startup --skill performance-analysis
Agent 安装分布
claude-code
4
windsurf
2
opencode
2
gemini-cli
2
trae
1
codex
1
Skill 文档
Performance Profiling
When to Use
- Establishing performance baselines before optimization
- Diagnosing slow response times, high CPU, or memory issues
- Identifying bottlenecks in application, database, or infrastructure
- Planning capacity for expected load increases
- Validating performance improvements after optimization
- Creating performance budgets for new features
Core Methodology
The Golden Rule: Measure First
Never optimize based on assumptions. Follow this order:
- Measure – Establish baseline metrics
- Identify – Find the actual bottleneck
- Hypothesize – Form a theory about the cause
- Fix – Implement targeted optimization
- Validate – Measure again to confirm improvement
- Document – Record findings and decisions
Profiling Hierarchy
Profile at the right level to find the actual bottleneck:
Application Level
|-- Request/Response timing
|-- Function/Method profiling
|-- Memory allocation tracking
|
System Level
|-- CPU utilization per process
|-- Memory usage patterns
|-- I/O wait times
|-- Network latency
|
Infrastructure Level
|-- Database query performance
|-- Cache hit rates
|-- External service latency
|-- Resource saturation
Profiling Patterns
CPU Profiling
Identify what code consumes CPU time:
- Sampling profilers – Low overhead, statistical accuracy
- Instrumentation profilers – Exact counts, higher overhead
- Flame graphs – Visual representation of call stacks
Key metrics:
- Self time (time in function itself)
- Total time (self time + time in called functions)
- Call count and frequency
Memory Profiling
Track allocation patterns and detect leaks:
- Heap snapshots – Point-in-time memory state
- Allocation tracking – What allocates memory and when
- Garbage collection analysis – GC frequency and duration
Key metrics:
- Heap size over time
- Object retention
- Allocation rate
- GC pause times
I/O Profiling
Measure disk and network operations:
- Disk I/O – Read/write latency, throughput, IOPS
- Network I/O – Latency, bandwidth, connection count
- Database I/O – Query time, connection pool usage
Key metrics:
- Latency percentiles (p50, p95, p99)
- Throughput (ops/sec, MB/sec)
- Queue depth and wait times
Bottleneck Identification
The USE Method
For each resource, check:
- Utilization – Percentage of time resource is busy
- Saturation – Degree of queued work
- Errors – Error count for the resource
The RED Method
For services, measure:
- Rate – Requests per second
- Errors – Failed requests per second
- Duration – Distribution of request latencies
Common Bottleneck Patterns
| Pattern | Symptoms | Typical Causes |
|---|---|---|
| CPU-bound | High CPU, low I/O wait | Inefficient algorithms, tight loops |
| Memory-bound | High memory, GC pressure | Memory leaks, large allocations |
| I/O-bound | Low CPU, high I/O wait | Slow queries, network latency |
| Lock contention | Low CPU, high wait time | Synchronization, connection pools |
| N+1 queries | Many small DB queries | Missing joins, lazy loading |
Amdahl’s Law
Optimization impact is limited by the fraction of time affected:
If 90% of time is in function A and 10% in function B:
- Optimizing A by 50% = 45% total improvement
- Optimizing B by 50% = 5% total improvement
Focus on the biggest contributors first.
Capacity Planning
Baseline Establishment
Measure current capacity under production load:
- Peak load metrics – Maximum concurrent users, requests/sec
- Resource headroom – How close to limits at peak
- Scaling patterns – Linear, sub-linear, or super-linear
Load Testing Approach
- Establish baseline – Current performance at normal load
- Ramp testing – Gradually increase load to find limits
- Stress testing – Push beyond limits to understand failure modes
- Soak testing – Sustained load to find memory leaks, degradation
Capacity Metrics
| Metric | What It Tells You |
|---|---|
| Throughput at saturation | Maximum system capacity |
| Latency at 80% load | Performance before degradation |
| Error rate under stress | Failure patterns |
| Recovery time | How quickly system returns to normal |
Growth Planning
Required Capacity = (Current Load x Growth Factor) + Safety Margin
Example:
- Current: 1000 req/sec
- Expected growth: 50% per year
- Safety margin: 30%
Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec
Optimization Patterns
Quick Wins
- Enable caching – Application, CDN, database query cache
- Add indexes – For slow queries identified in profiling
- Compression – Gzip/Brotli for responses
- Connection pooling – Reduce connection overhead
- Batch operations – Reduce round-trips
Algorithmic Improvements
- Reduce complexity – O(n^2) to O(n log n)
- Lazy evaluation – Defer work until needed
- Memoization – Cache computed results
- Pagination – Limit data processed at once
Architectural Changes
- Horizontal scaling – Add more instances
- Async processing – Queue background work
- Read replicas – Distribute read load
- Caching layers – Redis, Memcached
- CDN – Edge caching for static content
Best Practices
- Profile in production-like environments; development can have different characteristics
- Use percentiles (p95, p99) not averages for latency
- Monitor continuously, not just during incidents
- Set performance budgets and enforce them in CI
- Document baseline metrics before making changes
- Keep profiling overhead low in production
- Correlate metrics across layers (application, database, infrastructure)
- Understand the difference between latency and throughput
Anti-Patterns
- Optimizing without measurement
- Using averages for latency metrics
- Profiling only in development
- Ignoring tail latencies (p99, p999)
- Premature optimization of non-bottleneck code
- Over-engineering for hypothetical scale
- Caching without invalidation strategy
References
- Profiling Tools Reference – Tools by language and platform