performance
npx skills add https://github.com/outfitter-dev/agents --skill performance
Agent 安装分布
Skill 文档
Performance Engineering
Evidence-based performance optimization â measure â profile â optimize â validate.
<when_to_use>
- Profiling slow code paths or bottlenecks
- Identifying memory leaks or excessive allocations
- Optimizing latency-critical operations (P95, P99)
- Benchmarking competing implementations
- Database query optimization
- Reducing CPU usage in hot paths
- Improving throughput (RPS, ops/sec)
NOT for: premature optimization, optimization without measurement, guessing at bottlenecks
</when_to_use>
<iron_law>
NO OPTIMIZATION WITHOUT MEASUREMENT
Required workflow:
- Measure baseline performance with realistic workload
- Profile to identify actual bottleneck
- Optimize the bottleneck (not what you think is slow)
- Measure again to verify improvement
- Document gains and tradeoffs
Optimizing unmeasured code wastes time and introduces bugs.
</iron_law>
Load the maintain-tasks skill for stage tracking:
Stage 1: Establishing baseline
- content: “Establish performance baseline with realistic workload”
- activeForm: “Establishing performance baseline”
Stage 2: Profiling bottlenecks
- content: “Profile code to identify actual bottlenecks”
- activeForm: “Profiling code to identify bottlenecks”
Stage 3: Analyzing root cause
- content: “Analyze profiling data to determine root cause”
- activeForm: “Analyzing profiling data”
Stage 4: Implementing optimization
- content: “Implement targeted optimization for identified bottleneck”
- activeForm: “Implementing optimization”
Stage 5: Validating improvement
- content: “Measure performance gains and verify no regressions”
- activeForm: “Validating performance improvement”
Key Performance Indicators
Latency (response time):
- P50 (median) â typical case
- P95 â most users
- P99 â tail latency
- P99.9 â outliers
- TTFB â time to first byte
- TTLB â time to last byte
Throughput:
- RPS â requests per second
- ops/sec â operations per second
- bytes/sec â data transfer rate
- queries/sec â database throughput
Memory:
- Heap usage â allocated memory
- GC frequency â garbage collection pauses
- GC duration â stop-the-world time
- Allocation rate â memory churn
- Resident set size (RSS) â total memory
CPU:
- CPU time â total compute
- Wall time â elapsed time
- Hot paths â frequently executed code
- Time complexity â algorithmic efficiency
- CPU utilization â percentage used
Always measure:
- Before optimization (baseline)
- After optimization (improvement)
- Under realistic load (not toy data)
- Multiple runs (account for variance)
<profiling_tools>
TypeScript/Bun
Built-in timing:
console.time('operation')
// ... code to measure
console.timeEnd('operation')
// High precision
const start = Bun.nanoseconds()
// ... code to measure
const elapsed = Bun.nanoseconds() - start
console.log(`Took ${elapsed / 1_000_000}ms`)
Performance API:
const mark1 = performance.mark('start')
// ... code to measure
const mark2 = performance.mark('end')
performance.measure('operation', 'start', 'end')
const measure = performance.getEntriesByName('operation')[0]
console.log(`Duration: ${measure.duration}ms`)
Memory profiling:
- Chrome DevTools â Memory tab â heap snapshots
- Node.js
--inspectflag + Chrome DevTools process.memoryUsage()for RSS/heap tracking
CPU profiling:
- Chrome DevTools â Performance tab â record session
- Node.js
--profflag +node --prof-process - Flamegraphs for visualization
Rust
Benchmarking:
#[cfg(test)]
mod benches {
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_function(black_box(42)))
});
}
criterion_group!(benches, benchmark_function);
criterion_main!(benches);
}
Profiling:
cargo benchâ criterion benchmarksperf record+perf reportâ Linux profilingcargo flamegraphâ visual flamegraphscargo bloatâ binary size analysisvalgrind --tool=callgrindâ detailed profilingheaptrackâ memory profiling
Instrumentation:
use std::time::Instant;
let start = Instant::now();
// ... code to measure
let duration = start.elapsed();
println!("Took: {:?}", duration);
</profiling_tools>
<optimization_patterns>
Algorithm Improvements
Time complexity:
- O(n²) â O(n log n) â sorting, searching
- O(n) â O(log n) â binary search, trees
- O(n) â O(1) â hash maps, memoization
Space-time tradeoffs:
- Cache computed results (memoization)
- Precompute expensive operations
- Index data for faster lookup
- Use hash maps for O(1) access
Memory Optimization
Reduce allocations:
// Bad: creates new array each iteration
for (const item of items) {
const results = []
results.push(process(item))
}
// Good: reuse array
const results = []
for (const item of items) {
results.push(process(item))
}
// Bad: allocates String every time
fn format_user(name: &str) -> String {
format!("User: {}", name)
}
// Good: reuses buffer
fn format_user(name: &str, buf: &mut String) {
buf.clear();
buf.push_str("User: ");
buf.push_str(name);
}
Memory pooling:
- Reuse expensive objects (connections, buffers)
- Object pools for frequently allocated types
- Arena allocators for batch allocations
Lazy evaluation:
- Compute only when needed
- Stream processing vs loading all data
- Iterators over materialized collections
I/O Optimization
Batching:
- Batch API calls (1 request vs 100)
- Batch database writes (bulk insert)
- Batch file operations (single write vs many)
Caching:
- Cache expensive computations
- Cache database queries (Redis, in-memory)
- Cache API responses (HTTP caching)
- Invalidate stale cache entries
Async I/O:
- Non-blocking operations (async/await)
- Concurrent requests (Promise.all, tokio::spawn)
- Connection pooling (reuse connections)
Database Optimization
Query optimization:
- Add indexes for common queries
- Use EXPLAIN/EXPLAIN ANALYZE
- Avoid N+1 queries (use joins or batch loading)
- Select only needed columns
- Filter at database level (WHERE vs client filter)
Schema design:
- Normalize to reduce duplication
- Denormalize for read-heavy workloads
- Partition large tables
- Use appropriate data types
Connection management:
- Connection pooling (don’t create per request)
- Prepared statements (avoid SQL parsing)
- Transaction batching (reduce round trips)
</optimization_patterns>
Loop: Measure â Profile â Analyze â Optimize â Validate
- Define performance goal â target metric (e.g., P95 < 100ms)
- Establish baseline â measure current performance under realistic load
- Profile systematically â identify actual bottleneck (not guesses)
- Analyze root cause â understand why code is slow
- Design optimization â plan targeted improvement
- Implement optimization â make focused change
- Measure improvement â verify gains, check for regressions
- Document results â record baseline, optimization, gains, tradeoffs
At each step:
- Document measurements with methodology
- Note profiling tool output
- Track optimization attempts (what worked/failed)
- Update performance documentation
Before declaring optimization complete:
Check gains:
- â Measured improvement meets target?
- â Improvement statistically significant?
- â Tested under realistic load?
- â Multiple runs confirm consistency?
Check regressions:
- â No degradation in other metrics?
- â Memory usage still acceptable?
- â Code complexity still manageable?
- â Tests still pass?
Check documentation:
- â Baseline measurements recorded?
- â Optimization approach explained?
- â Gains quantified with numbers?
- â Tradeoffs documented?
ALWAYS:
- Measure before optimizing (baseline)
- Profile to find actual bottleneck
- Use realistic workload (not toy data)
- Measure multiple runs (account for variance)
- Document baseline and improvements
- Check for regressions in other metrics
- Consider readability vs performance tradeoff
- Verify statistical significance
NEVER:
- Optimize without measuring first
- Guess at bottleneck without profiling
- Benchmark with unrealistic data
- Trust single-run measurements
- Skip documentation of results
- Sacrifice correctness for speed
- Optimize without clear performance goal
- Ignore algorithmic improvements
Methodology:
- benchmarking.md â rigorous benchmarking methodology
Related skills:
- codebase-recon â evidence-based investigation (foundation)
- debugging â structured bug investigation
- typescript-dev â correctness before performance