performance-analysis

📁 rsmdt/the-startup 📅 Jan 24, 2026

总安装量

周安装量

#33949

全站排名

安装命令

npx skills add https://github.com/rsmdt/the-startup --skill performance-analysis

Agent 安装分布

claude-code 4

windsurf 2

opencode 2

gemini-cli 2

trae 1

codex 1

Skill 文档

Performance Profiling

When to Use

Establishing performance baselines before optimization
Diagnosing slow response times, high CPU, or memory issues
Identifying bottlenecks in application, database, or infrastructure
Planning capacity for expected load increases
Validating performance improvements after optimization
Creating performance budgets for new features

Core Methodology

The Golden Rule: Measure First

Never optimize based on assumptions. Follow this order:

Measure – Establish baseline metrics
Identify – Find the actual bottleneck
Hypothesize – Form a theory about the cause
Fix – Implement targeted optimization
Validate – Measure again to confirm improvement
Document – Record findings and decisions

Profiling Hierarchy

Profile at the right level to find the actual bottleneck:

Application Level
    |-- Request/Response timing
    |-- Function/Method profiling
    |-- Memory allocation tracking
    |
System Level
    |-- CPU utilization per process
    |-- Memory usage patterns
    |-- I/O wait times
    |-- Network latency
    |
Infrastructure Level
        |-- Database query performance
        |-- Cache hit rates
        |-- External service latency
        |-- Resource saturation

Profiling Patterns

CPU Profiling

Identify what code consumes CPU time:

Sampling profilers – Low overhead, statistical accuracy
Instrumentation profilers – Exact counts, higher overhead
Flame graphs – Visual representation of call stacks

Key metrics:

Self time (time in function itself)
Total time (self time + time in called functions)
Call count and frequency

Memory Profiling

Track allocation patterns and detect leaks:

Heap snapshots – Point-in-time memory state
Allocation tracking – What allocates memory and when
Garbage collection analysis – GC frequency and duration

Key metrics:

Heap size over time
Object retention
Allocation rate
GC pause times

I/O Profiling

Measure disk and network operations:

Disk I/O – Read/write latency, throughput, IOPS
Network I/O – Latency, bandwidth, connection count
Database I/O – Query time, connection pool usage

Key metrics:

Latency percentiles (p50, p95, p99)
Throughput (ops/sec, MB/sec)
Queue depth and wait times

Bottleneck Identification

The USE Method

For each resource, check:

Utilization – Percentage of time resource is busy
Saturation – Degree of queued work
Errors – Error count for the resource

The RED Method

For services, measure:

Rate – Requests per second
Errors – Failed requests per second
Duration – Distribution of request latencies

Common Bottleneck Patterns

Pattern	Symptoms	Typical Causes
CPU-bound	High CPU, low I/O wait	Inefficient algorithms, tight loops
Memory-bound	High memory, GC pressure	Memory leaks, large allocations
I/O-bound	Low CPU, high I/O wait	Slow queries, network latency
Lock contention	Low CPU, high wait time	Synchronization, connection pools
N+1 queries	Many small DB queries	Missing joins, lazy loading

Amdahl’s Law

Optimization impact is limited by the fraction of time affected:

If 90% of time is in function A and 10% in function B:
- Optimizing A by 50% = 45% total improvement
- Optimizing B by 50% = 5% total improvement

Focus on the biggest contributors first.

Capacity Planning

Baseline Establishment

Measure current capacity under production load:

Peak load metrics – Maximum concurrent users, requests/sec
Resource headroom – How close to limits at peak
Scaling patterns – Linear, sub-linear, or super-linear

Load Testing Approach

Establish baseline – Current performance at normal load
Ramp testing – Gradually increase load to find limits
Stress testing – Push beyond limits to understand failure modes
Soak testing – Sustained load to find memory leaks, degradation

Capacity Metrics

Metric	What It Tells You
Throughput at saturation	Maximum system capacity
Latency at 80% load	Performance before degradation
Error rate under stress	Failure patterns
Recovery time	How quickly system returns to normal

Growth Planning

Required Capacity = (Current Load x Growth Factor) + Safety Margin

Example:
- Current: 1000 req/sec
- Expected growth: 50% per year
- Safety margin: 30%

Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec

Optimization Patterns

Quick Wins

Enable caching – Application, CDN, database query cache
Add indexes – For slow queries identified in profiling
Compression – Gzip/Brotli for responses
Connection pooling – Reduce connection overhead
Batch operations – Reduce round-trips

Algorithmic Improvements

Reduce complexity – O(n^2) to O(n log n)
Lazy evaluation – Defer work until needed
Memoization – Cache computed results
Pagination – Limit data processed at once

Architectural Changes

Horizontal scaling – Add more instances
Async processing – Queue background work
Read replicas – Distribute read load
Caching layers – Redis, Memcached
CDN – Edge caching for static content

Best Practices

Profile in production-like environments; development can have different characteristics
Use percentiles (p95, p99) not averages for latency
Monitor continuously, not just during incidents
Set performance budgets and enforce them in CI
Document baseline metrics before making changes
Keep profiling overhead low in production
Correlate metrics across layers (application, database, infrastructure)
Understand the difference between latency and throughput

Anti-Patterns

Optimizing without measurement
Using averages for latency metrics
Profiling only in development
Ignoring tail latencies (p99, p999)
Premature optimization of non-bottleneck code
Over-engineering for hypothetical scale
Caching without invalidation strategy

References

Profiling Tools Reference – Tools by language and platform

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台