python-performance-optimization

📁 nickcrew/claude-ctx-plugin 📅 Jan 19, 2026
22
总安装量
8
周安装量
#17030
全站排名
安装命令
npx skills add https://github.com/nickcrew/claude-ctx-plugin --skill python-performance-optimization

Agent 安装分布

antigravity 5
gemini-cli 5
claude-code 4
opencode 4
codex 4

Skill 文档

Python Performance Optimization

Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.

When to Use This Skill

  • Code runs too slowly for production requirements
  • High CPU usage or memory consumption issues
  • Need to reduce API response times or batch processing duration
  • Application fails to scale under load
  • Optimizing data processing pipelines or scientific computing
  • Reducing cloud infrastructure costs through efficiency gains
  • Profile-guided optimization after measuring performance bottlenecks

Core Concepts

The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.

Optimization Hierarchy (in priority order):

  1. Algorithm complexity – O(n²) → O(n log n) provides exponential gains
  2. Data structure choice – List → Set for lookups (10,000x faster)
  3. Language features – Comprehensions, built-ins, generators
  4. Caching – Memoization for repeated calculations
  5. Compiled extensions – NumPy, Numba, Cython for hot paths
  6. Parallelism – Multiprocessing for CPU-bound work

Key Principle: Algorithmic improvements beat micro-optimizations every time.

Quick Reference

Load detailed guides for specific optimization areas:

Task Load reference
Profile code and find bottlenecks skills/python-performance-optimization/references/profiling.md
Algorithm and data structure optimization skills/python-performance-optimization/references/algorithms.md
Memory optimization and generators skills/python-performance-optimization/references/memory.md
String concatenation and file I/O skills/python-performance-optimization/references/string-io.md
NumPy, Numba, Cython, multiprocessing skills/python-performance-optimization/references/acceleration.md

Optimization Workflow

Phase 1: Measure

  1. Profile with cProfile – Identify slow functions
  2. Line profile hot paths – Find exact slow lines
  3. Memory profile – Check for memory bottlenecks
  4. Benchmark baseline – Record current performance

Phase 2: Analyze

  1. Check algorithm complexity – Is it O(n²) or worse?
  2. Evaluate data structures – Are you using lists for lookups?
  3. Identify repeated work – Can results be cached?
  4. Find I/O bottlenecks – Database queries, file operations

Phase 3: Optimize

  1. Improve algorithms first – Biggest impact
  2. Use appropriate data structures – Set/dict for O(1) lookups
  3. Apply caching@lru_cache for expensive functions
  4. Use generators – For large datasets
  5. Leverage NumPy/Numba – For numerical code
  6. Parallelize – Multiprocessing for CPU-bound tasks

Phase 4: Validate

  1. Re-profile – Verify improvements
  2. Benchmark – Measure speedup quantitatively
  3. Test correctness – Ensure optimizations didn’t break functionality
  4. Document – Explain why optimization was needed

Common Optimization Patterns

Pattern 1: Replace List with Set for Lookups

# Slow: O(n) lookup
if item in large_list:  # Bad

# Fast: O(1) lookup
if item in large_set:   # Good

Pattern 2: Use Comprehensions

# Slower
result = []
for i in range(n):
    result.append(i * 2)

# Faster (35% speedup)
result = [i * 2 for i in range(n)]

Pattern 3: Cache Expensive Calculations

from functools import lru_cache

@lru_cache(maxsize=None)
def expensive_function(n):
    # Result cached automatically
    return complex_calculation(n)

Pattern 4: Use Generators for Large Data

# Memory inefficient
def read_file(path):
    return [line for line in open(path)]  # Loads entire file

# Memory efficient
def read_file(path):
    for line in open(path):  # Streams line by line
        yield line.strip()

Pattern 5: Vectorize with NumPy

# Pure Python: ~500ms
result = sum(i**2 for i in range(1000000))

# NumPy: ~5ms (100x faster)
import numpy as np
result = np.sum(np.arange(1000000)**2)

Common Mistakes to Avoid

  1. Optimizing before profiling – You’ll optimize the wrong code
  2. Using lists for membership tests – Use sets/dicts instead
  3. String concatenation in loops – Use "".join() or StringIO
  4. Loading entire files into memory – Use generators
  5. N+1 database queries – Use JOINs or batch queries
  6. Ignoring built-in functions – They’re C-optimized and fast
  7. Premature optimization – Focus on algorithmic improvements first
  8. Not benchmarking – Always measure improvements quantitatively

Decision Tree

Start here: Profile with cProfile to find bottlenecks

Hot path is algorithm?

  • Yes → Check complexity, improve algorithm, use better data structures
  • No → Continue

Hot path is computation?

  • Numerical loops → NumPy or Numba
  • CPU-bound → Multiprocessing
  • Already fast enough → Done

Hot path is memory?

  • Large data → Generators, streaming
  • Many objects → __slots__, object pooling
  • Caching needed → @lru_cache or custom cache

Hot path is I/O?

  • Database → Batch queries, indexes, connection pooling
  • Files → Buffering, streaming
  • Network → Async I/O, request batching

Best Practices

  1. Profile before optimizing – Measure to find real bottlenecks
  2. Optimize algorithms first – O(n²) → O(n) beats micro-optimizations
  3. Use appropriate data structures – Set/dict for lookups, not lists
  4. Leverage built-ins – C-implemented built-ins are faster than pure Python
  5. Avoid premature optimization – Optimize hot paths identified by profiling
  6. Use generators for large data – Reduce memory usage with lazy evaluation
  7. Batch operations – Minimize overhead from syscalls and network requests
  8. Cache expensive computations – Use @lru_cache or custom caching
  9. Consider NumPy/Numba – Vectorization and JIT for numerical code
  10. Parallelize CPU-bound work – Use multiprocessing to utilize all cores

Resources