performance-profiling
1
总安装量
1
周安装量
#44906
全站排名
安装命令
npx skills add https://github.com/heshamfs/materials-simulation-skills --skill performance-profiling
Agent 安装分布
replit
1
windsurf
1
trae
1
kiro-cli
1
codex
1
Skill 文档
Performance Profiling
Goal
Provide tools to analyze simulation performance, identify bottlenecks, and recommend optimization strategies for computational materials science simulations.
Requirements
- Python 3.8+
- No external dependencies (uses Python standard library only)
- Works on Linux, macOS, and Windows
Inputs to Gather
Before running profiling scripts, collect from the user:
| Input | Description | Example |
|---|---|---|
| Simulation log | Log file with timing information | simulation.log |
| Scaling data | JSON with multi-run performance data | scaling_data.json |
| Simulation parameters | JSON with mesh, fields, solver config | params.json |
| Available memory | System memory in GB (optional) | 16.0 |
Decision Guidance
When to Use Each Script
Need to identify slow phases?
âââ YES â Use timing_analyzer.py
â âââ Parse simulation logs for timing data
â
Need to understand parallel performance?
âââ YES â Use scaling_analyzer.py
â âââ Analyze strong or weak scaling efficiency
â
Need to estimate memory requirements?
âââ YES â Use memory_profiler.py
â âââ Estimate memory from problem parameters
â
Need optimization recommendations?
âââ YES â Use bottleneck_detector.py
âââ Combine analyses and get actionable advice
Choosing Analysis Thresholds
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Phase dominance | <30% | 30-50% | >50% |
| Parallel efficiency | >0.80 | 0.70-0.80 | <0.70 |
| Memory usage | <60% | 60-80% | >80% |
Script Outputs (JSON Fields)
| Script | Key Outputs |
|---|---|
timing_analyzer.py |
timing_data.phases, timing_data.slowest_phase, timing_data.total_time |
scaling_analyzer.py |
scaling_analysis.results, scaling_analysis.efficiency_threshold_processors |
memory_profiler.py |
memory_profile.total_memory_gb, memory_profile.per_process_gb, memory_profile.warnings |
bottleneck_detector.py |
bottlenecks, recommendations |
Workflow
Complete Profiling Workflow
- Analyze timing from simulation logs
- Analyze scaling from multi-run data (if available)
- Profile memory from simulation parameters
- Detect bottlenecks and get recommendations
- Implement optimizations based on recommendations
- Re-profile to verify improvements
Quick Profiling (Timing Only)
- Run timing analyzer on simulation log
- Identify dominant phases (>50% of runtime)
- Apply targeted optimizations to dominant phases
CLI Examples
Timing Analysis
# Basic timing analysis
python3 scripts/timing_analyzer.py \
--log simulation.log \
--json
# Custom timing pattern
python3 scripts/timing_analyzer.py \
--log simulation.log \
--pattern 'Step\s+(\w+)\s+took\s+([\d.]+)s' \
--json
Scaling Analysis
# Strong scaling (fixed problem size)
python3 scripts/scaling_analyzer.py \
--data scaling_data.json \
--type strong \
--json
# Weak scaling (constant work per processor)
python3 scripts/scaling_analyzer.py \
--data scaling_data.json \
--type weak \
--json
Memory Profiling
# Estimate memory requirements
python3 scripts/memory_profiler.py \
--params simulation_params.json \
--available-gb 16.0 \
--json
Bottleneck Detection
# Detect bottlenecks from timing only
python3 scripts/bottleneck_detector.py \
--timing timing_results.json \
--json
# Comprehensive analysis with all inputs
python3 scripts/bottleneck_detector.py \
--timing timing_results.json \
--scaling scaling_results.json \
--memory memory_results.json \
--json
Conversational Workflow Example
User: My simulation is taking too long. Can you help me identify what’s slow?
Agent workflow:
- Ask for simulation log file
- Run timing analyzer:
python3 scripts/timing_analyzer.py --log simulation.log --json - Interpret results:
- If solver dominates (>50%): Recommend preconditioner tuning
- If assembly dominates: Recommend caching or vectorization
- If I/O dominates: Recommend reducing output frequency
- If user has multi-run data, analyze scaling:
python3 scripts/scaling_analyzer.py --data scaling.json --type strong --json - Generate comprehensive recommendations:
python3 scripts/bottleneck_detector.py --timing timing.json --scaling scaling.json --json
Interpretation Guidance
Timing Analysis
| Scenario | Meaning | Action |
|---|---|---|
| Solver >70% | Solver-dominated | Tune preconditioner, check tolerance |
| Assembly >50% | Assembly-dominated | Cache matrices, vectorize, parallelize |
| I/O >30% | I/O-dominated | Reduce frequency, use parallel I/O |
| Balanced (<30% each) | Well-balanced | Look for algorithmic improvements |
Scaling Analysis
| Efficiency | Meaning | Action |
|---|---|---|
| >0.80 | Excellent scaling | Continue scaling up |
| 0.70-0.80 | Good scaling | Monitor at larger scales |
| 0.50-0.70 | Poor scaling | Investigate communication/load balance |
| <0.50 | Very poor scaling | Reduce processor count or redesign |
Memory Profile
| Usage | Meaning | Action |
|---|---|---|
| <60% available | Safe | No action needed |
| 60-80% available | Moderate | Monitor, consider optimization |
| >80% available | High | Reduce resolution or increase processors |
| >100% available | Exceeds capacity | Must reduce problem size |
Error Handling
| Error | Cause | Resolution |
|---|---|---|
Log file not found |
Invalid path | Verify log file path |
No timing data found |
Pattern mismatch | Provide custom pattern with –pattern |
At least 2 runs required |
Insufficient data | Provide more scaling runs |
Missing required parameters |
Incomplete params | Add mesh and fields to params file |
Optimization Strategies by Bottleneck Type
Solver Bottlenecks
- Use algebraic multigrid (AMG) preconditioner
- Tighten solver tolerance if over-solving
- Consider direct solver for small problems
- Profile matrix assembly vs solve time
Assembly Bottlenecks
- Cache element matrices if geometry is static
- Use vectorized assembly routines
- Consider matrix-free methods
- Parallelize assembly with coloring
I/O Bottlenecks
- Reduce output frequency
- Use parallel I/O (HDF5, MPI-IO)
- Write to fast scratch storage
- Compress output data
Scaling Bottlenecks
- Investigate communication overhead
- Check for load imbalance
- Reduce synchronization points
- Use asynchronous communication
- Consider hybrid MPI+OpenMP
Memory Bottlenecks
- Reduce mesh resolution
- Use iterative solver (lower memory than direct)
- Enable out-of-core computation
- Increase number of processors
- Use single precision where appropriate
Limitations
- Log parsing: Depends on pattern matching; may miss unusual formats
- Scaling analysis: Requires at least 2 runs for meaningful results
- Memory estimation: Approximate; actual usage may vary
- Recommendations: General guidance; may need domain-specific tuning
References
references/profiling_guide.md– Profiling concepts and interpretationreferences/optimization_strategies.md– Detailed optimization approaches
Version History
- v1.0.0 (2025-01-22): Initial release with 4 profiling scripts