performance-profiling
2
总安装量
2
周安装量
#67796
全站排名
安装命令
npx skills add https://github.com/wojons/skills --skill performance-profiling
Agent 安装分布
amp
2
cline
2
opencode
2
cursor
2
kimi-cli
2
codex
2
Skill 文档
Performance Profiling
Analyze system and application performance using comprehensive profiling techniques including Linux kernel-level tools (perf, ftrace, eBPF, SystemTap), application-level profiling, bottleneck identification, and optimization recommendations to improve system responsiveness, throughput, and resource efficiency.
When to use me
Use this skill when:
- Application performance is slow or degrading
- System resource utilization is high
- Identifying CPU, memory, I/O, or network bottlenecks
- Optimizing application response times
- Debugging performance regressions
- Capacity planning and resource sizing
- Comparing performance before/after changes
- Analyzing production performance issues
- Creating performance baselines
- Tuning system and application parameters
What I do
1. System-Level Profiling
- CPU profiling: Analyze CPU usage, context switches, interrupts, scheduler latency
- Memory profiling: Analyze memory usage, page faults, swapping, memory leaks
- I/O profiling: Analyze disk I/O, file system performance, storage latency
- Network profiling: Analyze network throughput, latency, packet loss, connections
- Kernel profiling: Analyze kernel functions, system calls, interrupt handlers
2. Application-Level Profiling
- Application CPU usage: Profile application-specific CPU consumption
- Memory allocation: Track heap allocations, garbage collection, memory leaks
- Function timing: Measure function execution times and call frequencies
- Database query profiling: Analyze SQL query performance and optimization
- API endpoint profiling: Measure API response times and throughput
3. Tool Integration
- Linux perf: CPU profiling, hardware performance counters, tracepoints
- eBPF/BCC: Dynamic tracing, custom performance instrumentation
- Ftrace: Kernel function tracing, event tracing, latency measurements
- SystemTap: System-wide tracing and profiling
- Application profilers: Language-specific profiling tools
- Container profiling: Docker, Kubernetes performance analysis
4. Bottleneck Identification
- Hot spot detection: Identify frequently executed code paths
- Resource contention: Detect lock contention, CPU starvation, I/O wait
- Latency analysis: Measure and analyze latency distributions
- Scalability analysis: Identify scalability limits and bottlenecks
- Anomaly detection: Detect performance anomalies and regressions
5. Optimization Recommendations
- Code optimizations: Suggest algorithmic improvements, caching strategies
- Configuration tuning: Recommend system and application tuning parameters
- Architecture improvements: Suggest architectural changes for performance
- Resource allocation: Recommend optimal resource allocation strategies
- Monitoring setup: Recommend performance monitoring configurations
6. Visualization & Reporting
- Flame graphs: Generate CPU and memory flame graphs for visualization
- Heat maps: Create latency heat maps for time-series analysis
- Performance dashboards: Create real-time performance dashboards
- Trend analysis: Analyze performance trends over time
- Comparison reports: Compare performance across versions/environments
Profiling Tools Covered
Linux Kernel-Level Tools
- perf: Linux performance events for CPU profiling, hardware counters
- eBPF/BCC: Extended Berkeley Packet Filter for dynamic tracing
- bpftrace: High-level tracing language for eBPF
- Ftrace: Linux kernel internal tracer for function tracing
- SystemTap: System-wide tracing and profiling framework
- LTTng: Linux Trace Toolkit next generation
- ktap: Lightweight kernel tracing
Application-Level Tools
- Java: JProfiler, YourKit, VisualVM, Async Profiler
- Python: cProfile, py-spy, Scalene, line_profiler
- Node.js: clinic.js, 0x, node –prof, v8-profiler
- Go: pprof, trace, delve, gops
- Ruby: ruby-prof, stackprof, rbspy
- .NET: dotnet-counters, dotnet-trace, PerfView
- PHP: Xdebug, Blackfire, Tideways
- C/C++: gprof, Valgrind, Intel VTune, perf
System Monitoring Tools
- top/htop: Process monitoring
- vmstat: Virtual memory statistics
- iostat: I/O statistics
- netstat/ss: Network statistics
- sar: System activity reporter
- dstat: Versatile resource statistics
- nmon: Nigel’s performance monitor
Visualization Tools
- FlameGraph: CPU and memory flame graphs
- perfetto: System tracing and performance visualization
- grafana: Performance dashboard visualization
- prometheus: Time-series monitoring and alerting
- jaeger: Distributed tracing visualization
Analysis Techniques
CPU Profiling with perf
# Sample CPU usage for 30 seconds
perf record -F 99 -ag -- sleep 30
# Generate flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Analyze hardware performance counters
perf stat -e cycles,instructions,cache-misses,branch-misses ./application
# Trace system calls
perf trace -e syscalls:sys_enter_* ./application
eBPF Tracing with BCC
from bcc import BPF
# eBPF program to trace function calls
bpf_text = """
#include <uapi/linux/ptrace.h>
struct data_t {
u64 timestamp;
u32 pid;
char comm[TASK_COMM_LEN];
u64 duration_ns;
};
BPF_HASH(start, u32);
BPF_PERF_OUTPUT(events);
int trace_entry(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
start.update(&pid, &ts);
return 0;
}
int trace_return(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
u64 *tsp = start.lookup(&pid);
if (tsp == 0) {
return 0;
}
u64 duration = bpf_ktime_get_ns() - *tsp;
struct data_t data = {};
data.timestamp = bpf_ktime_get_ns();
data.pid = pid;
data.duration_ns = duration;
bpf_get_current_comm(&data.comm, sizeof(data.comm));
events.perf_submit(ctx, &data, sizeof(data));
start.delete(&pid);
return 0;
}
"""
# Attach to function entry and return
bpf = BPF(text=bpf_text)
bpf.attach_uprobe(name="application", sym="function_name", fn_name="trace_entry")
bpf.attach_uretprobe(name="application", sym="function_name", fn_name="trace_return")
Memory Leak Detection
# Monitor memory allocations
valgrind --leak-check=full --show-leak-kinds=all ./application
# Track heap allocations with eBPF
/usr/share/bcc/tools/memleak -p $(pidof application)
# Analyze memory usage over time
cat /proc/$(pidof application)/smaps | grep -i pss | awk '{total+=$2} END {print total}'
# Monitor garbage collection (Java)
jstat -gc $(pidof java) 1s
Latency Analysis
def analyze_latency_distribution(latency_samples):
"""
Analyze latency distribution and identify outliers.
"""
import numpy as np
from scipy import stats
latencies = np.array(latency_samples)
analysis = {
'count': len(latencies),
'mean': np.mean(latencies),
'median': np.median(latencies),
'p90': np.percentile(latencies, 90),
'p95': np.percentile(latencies, 95),
'p99': np.percentile(latencies, 99),
'std_dev': np.std(latencies),
'min': np.min(latencies),
'max': np.max(latencies),
'outliers': []
}
# Identify outliers using IQR method
q1 = np.percentile(latencies, 25)
q3 = np.percentile(latencies, 75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
outliers = latencies[(latencies < lower_bound) | (latencies > upper_bound)]
analysis['outliers'] = outliers.tolist()
analysis['outlier_percentage'] = len(outliers) / len(latencies) * 100
return analysis
Examples
# Profile CPU usage for 60 seconds
npm run performance-profiling:cpu -- --duration 60 --output cpu-profile.json
# Generate flame graph
npm run performance-profiling:flamegraph -- --pid $(pidof application) --output flamegraph.svg
# Analyze memory leaks
npm run performance-profiling:memory -- --application myapp --leak-check
# Trace database queries
npm run performance-profiling:database -- --database postgresql --duration 300
# Profile API endpoints
npm run performance-profiling:api -- --endpoints "/api/*" --duration 60 --output api-performance.json
# Compare performance before/after changes
npm run performance-profiling:compare -- --before baseline.json --after new-version.json --output comparison.json
# Analyze system resource usage
npm run performance-profiling:system -- --metrics cpu,memory,disk,network --duration 300
# Create performance dashboard
npm run performance-profiling:dashboard -- --metrics all --interval 1s --duration 3600
# Detect bottlenecks in microservices
npm run performance-profiling:microservices -- --services auth,payment,notification --duration 600
# Optimize configuration based on profiling
npm run performance-profiling:optimize -- --profile profile.json --output optimizations.md
# Monitor production performance
npm run performance-profiling:monitor -- --production --alert-threshold p95:200ms
Output format
Performance Profiling Report:
Performance Profiling Report
ââââââââââââââââââââââââââââ
System: payment-processing-service
Analysis Date: 2026-02-26
Duration: 300 seconds
Profiling Tools: perf, eBPF, Application Profiler
Executive Summary:
â ï¸ Performance issues detected: 3 critical, 2 warnings
â
System resources: Within normal limits
ð Overall performance score: 72/100
Critical Issues:
1. â Database query bottleneck (Severity: Critical)
⢠Query: SELECT * FROM transactions WHERE user_id = ?
⢠Average latency: 450ms (p95: 1200ms)
⢠Frequency: 1200 executions/minute
⢠Root cause: Missing index on user_id column
⢠Impact: 40% of API latency
⢠Recommendation: Add index on transactions.user_id
2. â Memory leak in cache service (Severity: Critical)
⢠Service: redis-cache-service
⢠Memory growth: 2MB/minute
⢠Total leaked: 120MB over 1 hour
⢠Pattern: Cache entries not expired properly
⢠Recommendation: Implement TTL and LRU eviction
3. â CPU contention in payment processor (Severity: Critical)
⢠Function: processPayment() in payment-service
⢠CPU usage: 85% during peak
⢠Bottleneck: Cryptographic operations
⢠Recommendation: Implement caching or hardware acceleration
Warnings:
1. â ï¸ API endpoint latency degradation (Severity: Warning)
⢠Endpoint: POST /api/v1/payments
⢠p95 latency increase: 150ms â 320ms (+113%)
⢠Timeframe: Last 7 days
⢠Recommendation: Profile endpoint and optimize
2. â ï¸ Garbage collection pauses (Severity: Warning)
⢠Application: notification-service (Java)
⢠GC pauses: 45ms average, 120ms max
⢠Frequency: Every 30 seconds
⢠Recommendation: Tune JVM garbage collector
System Resource Analysis:
ââââââââââââââââââââââ¬âââââââââââââ¬âââââââââââââ¬âââââââââââââ
â Resource â Usage â Threshold â Status â
ââââââââââââââââââââââ¼âââââââââââââ¼âââââââââââââ¼âââââââââââââ¤
â CPU â 65% â 80% â
Normal â
â Memory â 72% â 85% â
Normal â
â Disk I/O â 45% â 70% â
Normal â
â Network â 38% â 60% â
Normal â
â Database Connectionsâ 85% â 90% â ï¸ Warning â
ââââââââââââââââââââââ´âââââââââââââ´âââââââââââââ´âââââââââââââ
Application Performance:
⢠API Response Times:
- p50: 85ms â
- p95: 320ms â ï¸
- p99: 1200ms â
- Success Rate: 99.8% â
⢠Database Performance:
- Query Cache Hit Rate: 65% â ï¸
- Average Query Time: 85ms â
- Slow Queries (>100ms): 12% â ï¸
- Connection Pool Usage: 85% â ï¸
⢠Cache Performance:
- Redis Hit Rate: 92% â
- Cache Latency: 3ms â
- Memory Usage: 78% â ï¸
- Eviction Rate: 5% â
Flame Graph Analysis:
⢠Hot Functions:
1. processPayment() - 35% CPU time
2. validateTransaction() - 22% CPU time
3. updateDatabase() - 18% CPU time
4. sendNotification() - 8% CPU time
5. logActivity() - 5% CPU time
⢠Optimization Opportunities:
1. Cache validation results (potential 15% improvement)
2. Batch database updates (potential 10% improvement)
3. Async notifications (potential 8% improvement)
Memory Analysis:
⢠Heap Usage: 2.4GB
⢠Stack Usage: 320MB
⢠Native Memory: 450MB
⢠Garbage Collection:
- Young GC: 45ms every 30s
- Full GC: 120ms every 5min
- Throughput: 98.5%
I/O Analysis:
⢠Disk Read: 45MB/s (average)
⢠Disk Write: 28MB/s (average)
⢠File Descriptors: 1250/4096 (31%)
⢠Network Throughput:
- Inbound: 85Mbps
- Outbound: 120Mbps
- Connections: 850 active
Bottleneck Timeline:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Bottleneck Timeline (Last 60 minutes) â
â â
â 00:00 â¼ââââââââ¬âââââââââââââââ¬ââââââââââââââ¬ââââââââââââââ â
â â CPU â Database â Memory â Network â
â 15:00 â¼ââââââââ¼âââââââââââââââ¼ââââââââââââââ¼ââââââââââââââ â
â â âââ â âââââââââ â âââ â ââ â
â 30:00 â¼ââââââââ¼âââââââââââââââ¼ââââââââââââââ¼ââââââââââââââ â
â â âââââ â ââââââââââââ â âââââ â âââ â
â 45:00 â¼ââââââââ¼âââââââââââââââ¼ââââââââââââââ¼ââââââââââââââ â
â â âââââââ ââââââââââââââ âââââââ â ââââ â
â 60:00 â¼ââââââââ´âââââââââââââââ´ââââââââââââââ´ââââââââââââââ â
â 0% 50% 100% â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Optimization Recommendations:
1. Immediate (High Impact):
⢠Add database index on transactions.user_id
⢠Implement cache TTL for redis-cache-service
⢠Optimize processPayment() cryptographic operations
2. Short-term (Medium Impact):
⢠Implement connection pooling for database
⢠Add query caching for frequent queries
⢠Batch database writes where possible
3. Long-term (Architectural):
⢠Implement read replicas for database
⢠Add CDN for static assets
⢠Implement circuit breakers for external services
Performance Metrics Baseline:
⢠CPU Usage: < 70% target
⢠Memory Usage: < 80% target
⢠API p95 Latency: < 200ms target
⢠Database Query Time: < 100ms target
⢠Cache Hit Rate: > 90% target
Monitoring Configuration:
⢠Alert on: p95 latency > 200ms
⢠Alert on: CPU usage > 80% for 5 minutes
⢠Alert on: Memory usage > 85%
⢠Alert on: Error rate > 1%
⢠Dashboard: Real-time performance metrics
Next Steps:
1. Implement database index (estimate: 2 hours)
2. Fix memory leak in cache service (estimate: 4 hours)
3. Optimize payment processor CPU usage (estimate: 8 hours)
4. Deploy optimizations with feature flags
5. Monitor performance for 24 hours
6. Schedule performance regression tests
JSON Output Format:
{
"analysis": {
"system": "payment-processing-service",
"analysis_date": "2026-02-26",
"duration_seconds": 300,
"profiling_tools": ["perf", "ebpf", "application_profiler"],
"overall_score": 72
},
"critical_issues": [
{
"id": "issue-db-001",
"description": "Database query bottleneck",
"severity": "critical",
"component": "database",
"metric": "query_latency",
"average_value": 450,
"p95_value": 1200,
"unit": "ms",
"frequency": "1200 executions/minute",
"root_cause": "Missing index on user_id column",
"impact": "40% of API latency",
"recommendation": "Add index on transactions.user_id",
"estimated_effort_hours": 2,
"priority": "high"
},
{
"id": "issue-memory-001",
"description": "Memory leak in cache service",
"severity": "critical",
"component": "cache",
"metric": "memory_growth",
"average_value": 2,
"unit": "MB/minute",
"total_leaked": 120,
"total_leaked_unit": "MB",
"timeframe": "1 hour",
"pattern": "Cache entries not expired properly",
"recommendation": "Implement TTL and LRU eviction",
"estimated_effort_hours": 4,
"priority": "high"
}
],
"system_resources": {
"cpu": {
"usage_percentage": 65,
"threshold": 80,
"status": "normal",
"breakdown": {
"user": 45,
"system": 20,
"iowait": 8,
"steal": 2
}
},
"memory": {
"usage_percentage": 72,
"threshold": 85,
"status": "normal",
"breakdown": {
"heap": 2400,
"stack": 320,
"native": 450,
"cached": 1200
}
},
"disk_io": {
"usage_percentage": 45,
"threshold": 70,
"status": "normal",
"read_mbps": 45,
"write_mbps": 28
},
"network": {
"usage_percentage": 38,
"threshold": 60,
"status": "normal",
"inbound_mbps": 85,
"outbound_mbps": 120,
"connections": 850
}
},
"application_performance": {
"api_response_times": {
"p50_ms": 85,
"p95_ms": 320,
"p99_ms": 1200,
"success_rate": 99.8
},
"database_performance": {
"query_cache_hit_rate": 65,
"average_query_time_ms": 85,
"slow_queries_percentage": 12,
"connection_pool_usage": 85
},
"cache_performance": {
"hit_rate": 92,
"latency_ms": 3,
"memory_usage_percentage": 78,
"eviction_rate": 5
}
},
"flame_graph_analysis": {
"hot_functions": [
{
"function": "processPayment",
"cpu_percentage": 35,
"optimization_opportunity": "Cache validation results"
},
{
"function": "validateTransaction",
"cpu_percentage": 22,
"optimization_opportunity": "Batch validation"
}
],
"optimization_opportunities": [
{
"description": "Cache validation results",
"estimated_improvement": 15,
"effort_hours": 8
},
{
"description": "Batch database updates",
"estimated_improvement": 10,
"effort_hours": 6
}
]
},
"optimization_recommendations": {
"immediate": [
"Add database index on transactions.user_id",
"Implement cache TTL for redis-cache-service",
"Optimize processPayment() cryptographic operations"
],
"short_term": [
"Implement connection pooling for database",
"Add query caching for frequent queries",
"Batch database writes where possible"
],
"long_term": [
"Implement read replicas for database",
"Add CDN for static assets",
"Implement circuit breakers for external services"
]
},
"performance_baseline": {
"cpu_usage_target": 70,
"memory_usage_target": 80,
"api_p95_latency_target": 200,
"database_query_time_target": 100,
"cache_hit_rate_target": 90
},
"next_steps": [
{
"action": "Implement database index",
"estimate_hours": 2,
"priority": "high"
},
{
"action": "Fix memory leak in cache service",
"estimate_hours": 4,
"priority": "high"
},
{
"action": "Optimize payment processor CPU usage",
"estimate_hours": 8,
"priority": "medium"
}
]
}
Performance Dashboard:
Performance Dashboard
ââââââââââââââââââââ
Status: ACTIVE
Last Update: 2026-02-26 19:45:00
Update Interval: 1 second
Real-time Metrics:
ââââââââââââââââââââââ¬âââââââââââââ¬âââââââââââââ¬âââââââââââââ
â Metric â Current â 1min Avg â Trend â
ââââââââââââââââââââââ¼âââââââââââââ¼âââââââââââââ¼âââââââââââââ¤
â CPU Usage â 65% â 62% â âï¸ Rising â
â Memory Usage â 72% â 71% â â Stable â
â API Latency (p95) â 320ms â 310ms âï¸ Rising â
â Database Latency â 85ms â 82ms â Stable â
â Cache Hit Rate â 92% â 91% âï¸ Falling â
â Error Rate â 0.2% â 0.3% âï¸ Falling â
ââââââââââââââââââââââ´âââââââââââââ´âââââââââââââ´âââââââââââââ
Alerts:
⢠â ï¸ API p95 latency above threshold (200ms): 320ms
⢠â
CPU usage within limits
⢠â
Memory usage within limits
⢠â ï¸ Database connections approaching limit (85%)
Hotspots:
1. processPayment(): 35% CPU (ð¥ Hot)
2. validateTransaction(): 22% CPU (â ï¸ Warm)
3. updateDatabase(): 18% CPU (â ï¸ Warm)
Resource Utilization Trend:
CPU: ââââââââââââââââââââââââââââââââââââââââ 65%
Memory: ââââââââââââââââââââââââââââââââââââââââ 72%
Disk: ââââââââââââââââââââââââââââââââââââââââ 45%
Network:ââââââââââââââââââââââââââââââââââââââââ 38%
Recent Events:
⢠19:40: Database query slowdown detected
⢠19:35: Cache miss rate increased by 15%
⢠19:30: API latency spike (p95: 450ms)
⢠19:25: Memory usage increased by 2%
Recommendations:
1. Add index on transactions.user_id (pending)
2. Implement cache TTL (in progress)
3. Optimize payment processor (planned)
Performance Score: 72/100
Status: Needs Improvement
Notes
- Profile in production-like environments for accurate results
- Use appropriate sampling rates to balance overhead and accuracy
- Compare against baselines to identify regressions
- Monitor profiling overhead to avoid affecting production performance
- Use flame graphs for visual bottleneck identification
- Combine multiple tools for comprehensive analysis
- Profile representative workloads that match production usage
- Consider security implications of profiling in production
- Document profiling methodology for reproducibility
- Automate performance regression testing in CI/CD pipelines