performance-profiling

📁 lv416e/dotfiles 📅 1 day ago

总安装量

周安装量

安装命令

npx skills add https://github.com/lv416e/dotfiles --skill performance-profiling

Agent 安装分布

amp 1

cline 1

opencode 1

cursor 1

continue 1

kimi-cli 1

Skill 文档

Performance Profiling

Overview

Optimizing without measuring is guessing. Guessing wastes time and often makes things worse.

Core principle: ALWAYS measure before optimizing. Intuition about performance is wrong more often than right.

Violating the letter of this process is violating the spirit of performance engineering.

The Iron Law

MEASURE FIRST, OPTIMIZE SECOND - NO PREMATURE OPTIMIZATION WITHOUT PROFILING DATA

If you haven’t profiled it, you cannot optimize it. Hunches are not data.

When to Use

Use for ANY performance concern:

Slow API responses
High memory usage
Slow page loads
Database query timeouts
Build/CI pipeline slowdowns
Memory leaks
Throughput degradation under load

Use this ESPECIALLY when:

Someone says “this feels slow”
Under pressure to “just make it faster”
A “quick optimization” seems obvious
You want to add caching
You want to rewrite in a “faster” language
You’re about to refactor for performance

Don’t skip when:

The fix seems obvious (obvious fixes are often wrong)
It’s “just” adding an index (measure the impact)
You’re an expert (experts still measure)

The Four Phases

You MUST complete each phase before proceeding to the next.

Phase 1: Establish Baseline

BEFORE changing ANY code:

Define What “Fast Enough” Means
- Set concrete targets: “API responds in < 200ms at p95”
- No vague goals like “make it faster”
- Agree on metrics with stakeholders
- If you can’t define the target, you can’t know when you’re done

Measure Current Performance

# HTTP endpoints
hey -n 1000 -c 50 https://api.example.com/endpoint

# Application benchmarks
hyperfine --warmup 3 'command-to-benchmark'

# Database queries
EXPLAIN ANALYZE SELECT ...;

Record the Baseline
- Response time: p50, p95, p99
- Throughput: requests/second
- Resource usage: CPU%, memory, I/O
- Error rate under load
- Write these numbers down. You will compare against them.
Reproduce Under Realistic Conditions
- Production-like data volume
- Concurrent users matching real traffic
- Warm caches if production has warm caches
- Cold start if measuring cold start

No baseline? No optimization. Period.

Phase 2: Profile to Find Bottlenecks

Find where time is actually spent:

CPU Profiling

# Node.js
node --prof app.js
node --prof-process isolate-*.log

# Python
python -m cProfile -o output.prof script.py

# Go
go tool pprof http://localhost:6060/debug/pprof/profile

Generate Flame Graphs
- Flame graphs show exactly where CPU time goes
- Wide bars = time spent there
- Look for unexpected wide bars
- Don’t guess – read the graph

Memory Profiling

# Node.js
node --inspect app.js
# Chrome DevTools â Memory â Heap Snapshot

# Python
tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()

I/O and Network Profiling
- Database query logging with timing
- Network request tracing (OpenTelemetry)
- File system I/O monitoring
- External API call latency

Identify the Actual Bottleneck

Bottleneck Type	Symptoms	Tools
CPU-bound	High CPU%, slow computation	Profiler, flame graph
Memory-bound	High memory, GC pauses, swapping	Heap profiler, memory tracker
I/O-bound	Low CPU%, waiting on disk/network	Tracing, query logs
Concurrency	Lock contention, thread starvation	Thread profiler, lock analysis

Key insight: Most “slow” applications are I/O-bound, not CPU-bound. Don’t optimize CPU when you’re waiting on the database.

Phase 3: Algorithmic Complexity Analysis

Before micro-optimizing, check the algorithm:

Big-O Review
- What is the time complexity of the hot path?
- Is there an O(n^2) hiding in a loop?
- Are you doing O(n) lookups where O(1) is possible (array vs. hash map)?
- Nested database queries (N+1 problem)?
Common Hidden Complexity

Database Query Analysis

-- Always EXPLAIN before optimizing
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;

Check for sequential scans on large tables
Verify indexes are being used
Look for unnecessary joins
Detect N+1 queries in ORM-generated SQL

N+1 Query Detection

Phase 4: Targeted Optimization with Benchmarks

Optimize only what profiling identified:

Write a Benchmark First

// Before optimizing, capture current performance
bench('current implementation', () => {
  currentFunction(testData);
});

bench('optimized implementation', () => {
  optimizedFunction(testData);
});

Make ONE Change at a Time
- Single optimization per iteration
- Measure after each change
- If no measurable improvement, revert
- Don’t stack optimizations without measuring each
Verify Against Baseline
- Does it meet the target defined in Phase 1?
- Yes? Stop optimizing. You’re done.
- No? Return to Phase 2, find next bottleneck.
- Getting diminishing returns? Stop. Good enough is good enough.
Frontend-Specific Optimization
- Core Web Vitals: LCP, FID/INP, CLS
- Bundle size analysis (webpack-bundle-analyzer, source-map-explorer)
- Lazy loading for below-fold content
- Image optimization (WebP/AVIF, responsive images, lazy load)
- Code splitting at route boundaries
Memory Leak Resolution
- Take heap snapshots at intervals
- Compare snapshots: what grows?
- Common causes: event listeners not removed, closures holding references, growing caches without eviction
- Fix the leak, verify with another snapshot series

Load Testing

Before any production deployment of performance-sensitive changes:

# k6 example
k6 run --vus 50 --duration 5m load-test.js

# Artillery example
artillery run load-test.yml

Test at expected load AND 2-3x expected load
Monitor error rates under load
Check for degradation patterns (gradual vs. cliff)
Verify resource consumption stays bounded

Red Flags – STOP and Follow Process

If you catch yourself thinking:

“This is obviously the slow part”
“Let me just add some caching”
“Rewriting in Rust/Go will fix it”
“Let me optimize this loop”
“Add an index, that’ll fix it”
“Just increase the timeout”
“Premature optimization is bad, so I won’t measure”
“The profiler is too hard to set up”
“I’ll measure after I optimize”

ALL of these mean: STOP. Return to Phase 1.

Common Rationalizations

Excuse	Reality
“Obviously slow, don’t need to measure”	Obvious intuitions are wrong 70% of the time. Measure.
“Just add caching”	Caching without understanding causes stale data bugs and hides real issues.
“Rewrite in a faster language”	Algorithm matters more than language. O(n^2) in Rust is still O(n^2).
“Micro-benchmarks show improvement”	Micro-benchmarks don’t reflect real workload. Measure end-to-end.
“Profiler is too hard to set up”	10 minutes to set up profiler vs. hours guessing. Set it up.
“It’s just one query”	One query called 10,000 times is 10,000 queries. Measure frequency.
“Increase the timeout”	Timeouts mask real problems. Find why it’s slow.
“Premature optimization, skip it”	Measuring is not optimizing. Always measure. Decide after.
“Add an index, can’t hurt”	Wrong indexes slow writes. EXPLAIN first.
“Optimize later when it matters”	Launching without baseline means you can’t measure regression.

Anti-Patterns

Anti-Pattern	Consequence	Correct Approach
Optimizing without measuring	Wrong target, wasted effort, new bugs	Profile first, optimize the actual bottleneck
Micro-optimizing cold paths	No user-visible improvement	Focus on hot paths identified by profiling
Premature caching	Stale data, cache invalidation bugs, complexity	Fix the root cause; cache only after measuring
Ignoring algorithmic complexity	Linear “optimization” on exponential problem	Fix the algorithm before tuning constants
Optimizing in isolation	Benchmark looks good, production doesn’t improve	Test under realistic load and data
Stacking multiple optimizations	Can’t tell which one helped (or hurt)	One change at a time, measure each

Quick Reference

Phase	Key Activities	Success Criteria
1. Baseline	Define targets, measure current state, record numbers	Concrete metrics documented
2. Profile	CPU/memory/I/O profiling, flame graphs, identify bottleneck	Know WHERE time is spent
3. Complexity	Big-O review, N+1 detection, EXPLAIN queries	Algorithmic issues found or ruled out
4. Optimize	Benchmark, single change, measure, compare to baseline	Meets target or next bottleneck identified

Verification Checklist

Before marking performance work complete:

Performance target defined with concrete numbers
Baseline measured and recorded before any changes
Profiling data collected (not just intuition)
Bottleneck identified from profiling data
Algorithmic complexity reviewed
Database queries analyzed with EXPLAIN
Optimization benchmarked against baseline
Performance target met (or documented why not)
Load tested under realistic conditions
No regressions in correctness (all tests pass)
Memory stable under sustained load (no leaks)

Can’t check all boxes? You’re not done.

Integration with Other Skills

This skill requires using:

systematic-debugging – REQUIRED when performance issue has an unclear cause (use Phase 1 root cause investigation)
test-driven-development – REQUIRED for writing benchmarks and ensuring optimizations don’t break correctness

Complementary skills:

documentation-generation – Document baseline metrics, optimization decisions, and load test results

Final Rule

No profiling data â no optimization
No baseline â no comparison
No target â no "done"

Measure. Profile. Optimize. Verify. In that order. Always.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台