latency-optimization

📁 melodic-software/claude-code-plugins 📅 Jan 24, 2026

总安装量

周安装量

#35405

全站排名

安装命令

npx skills add https://github.com/melodic-software/claude-code-plugins --skill latency-optimization

Agent 安装分布

gemini-cli 5

codex 5

opencode 5

trae 4

antigravity 4

claude-code 4

Skill 文档

Latency Optimization

Comprehensive guide to reducing end-to-end latency in distributed systems – from network to application to database layers.

When to Use This Skill

Optimizing response times for user-facing applications
Creating latency budgets for distributed systems
Implementing geographic routing strategies
Reducing database query latency
Optimizing API response times
Understanding and measuring latency components

Latency Fundamentals

Understanding Latency

Latency Components:

Total Latency = Network + Processing + Queue + Serialization

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â                     Request Journey                          â
â                                                              â
â  Client âââº DNS âââº TCP âââº TLS âââº Server âââº DB âââº Back  â
â                                                              â
â  Components:                                                 â
â  âââ DNS Resolution: 0-100ms (cached: 0ms)                  â
â  âââ TCP Handshake: 1 RTT (~10-200ms)                       â
â  âââ TLS Handshake: 1-2 RTT (~20-400ms)                     â
â  âââ Request Transfer: depends on size                       â
â  âââ Server Processing: application-specific                 â
â  âââ Database Query: 1-1000ms typical                       â
â  âââ Response Transfer: depends on size                      â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Key Metrics:
- P50: Median latency (50th percentile)
- P95: 95th percentile (tail latency starts)
- P99: 99th percentile (important for SLOs)
- P99.9: Three nines (critical systems)

Latency Numbers Every Developer Should Know

Latency Reference (2024 estimates):

Operation                              Time
âââââââââââââââââââââââââââââââââââââââââââââââââââââ
L1 cache reference                     1 ns
L2 cache reference                     4 ns
Branch mispredict                      5 ns
L3 cache reference                     10 ns
Mutex lock/unlock                      25 ns
Main memory reference                  100 ns
Compress 1KB with Snappy              2,000 ns (2 Î¼s)
SSD random read                       16,000 ns (16 Î¼s)
Read 1 MB from memory                 50,000 ns (50 Î¼s)
Read 1 MB from SSD                    200,000 ns (200 Î¼s)
Round trip same datacenter            500,000 ns (500 Î¼s)
Read 1 MB from network (1Gbps)        10,000,000 ns (10 ms)
HDD random read                       10,000,000 ns (10 ms)
Round trip US East to US West         40,000,000 ns (40 ms)
Round trip US to Europe               80,000,000 ns (80 ms)
Round trip US to Asia                 150,000,000 ns (150 ms)

Key Insights:
- Memory is 100x faster than SSD
- Same-datacenter is 80x faster than cross-continent
- Caching at any level provides huge wins

Latency Budget

Latency Budget Example (200ms target):

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â                    200ms Total Budget                        â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â                                                              â
â  ââââââââââââ¬âââââââââââ¬âââââââââââ¬âââââââââââ¬âââââââââââ  â
â  â Network  â   Auth   â  Service â    DB    â Response â  â
â  â   50ms   â   20ms   â   50ms   â   60ms   â   20ms   â  â
â  ââââââââââââ´âââââââââââ´âââââââââââ´âââââââââââ´âââââââââââ  â
â                                                              â
â  Breakdown:                                                  â
â  âââ Network (client â edge â origin): 50ms                 â
â  âââ Authentication/Authorization: 20ms                      â
â  âââ Service Processing: 50ms                               â
â  âââ Database Queries: 60ms                                 â
â  âââ Response Serialization + Transfer: 20ms                â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Budget Rules:
1. Allocate budgets based on criticality
2. Leave 10-20% headroom for variance
3. Monitor P99 against budget
4. Alert when consistently over budget
5. Renegotiate budgets as system evolves

Network Latency Optimization

Geographic Routing

Geographic Routing Strategies:

1. GeoDNS Routing
   User IP âââº DNS Resolver âââº Nearest Server IP

   Pros: Simple, works everywhere
   Cons: DNS caching, IP geolocation inaccuracy

2. Anycast Routing
   Same IP advertised from multiple locations
   BGP routes to nearest (network topology)

   Pros: Instant failover, no DNS delay
   Cons: Requires BGP expertise, stateful sessions tricky

3. Load Balancer Geo-routing
   Global LB âââº Regional LB âââº Servers

   Pros: Fine-grained control, health checking
   Cons: Adds latency hop, more complex

Selection Guide:
ââââââââââââââââââââ¬ââââââââââââââââââââââââââââââââââââââ
â Use Case         â Recommended Approach                â
ââââââââââââââââââââ¼ââââââââââââââââââââââââââââââââââââââ¤
â Static content   â Anycast CDN                         â
â API services     â GeoDNS + Regional deployments       â
â Real-time apps   â Anycast + Connection persistence    â
â Stateful apps    â GeoDNS with session affinity        â
ââââââââââââââââââââ´ââââââââââââââââââââââââââââââââââââââ

Protocol Optimization

Protocol-Level Optimizations:

1. HTTP/2 Benefits
   âââ Multiplexing (no head-of-line blocking)
   âââ Header compression (HPACK)
   âââ Server push (preemptive responses)
   âââ Single connection (reduced handshakes)

   Latency Impact: 20-50% improvement typical

2. HTTP/3 (QUIC) Benefits
   âââ 0-RTT connection resumption
   âââ No TCP head-of-line blocking
   âââ Built-in encryption
   âââ Connection migration (IP changes)

   Latency Impact: 10-30% over HTTP/2

3. TLS Optimization
   âââ TLS 1.3 (1-RTT handshake)
   âââ Session resumption (0-RTT)
   âââ OCSP stapling (no CA roundtrip)
   âââ Certificate chain optimization

   Latency Impact: 50-200ms saved per connection

4. TCP Optimization
   âââ TCP Fast Open (TFO)
   âââ Increased initial congestion window
   âââ BBR congestion control
   âââ Keep-alive for connection reuse

Connection Optimization

Connection Strategies:

1. Connection Pooling
   âââââââââââââââââââââââââââââââââââââââââââ
   â           Connection Pool               â
   â  âââââââ âââââââ âââââââ âââââââ      â
   â  âConn1â âConn2â âConn3â âConn4â      â
   â  ââââ¬âââ ââââ¬âââ ââââ¬âââ ââââ¬âââ      â
   âââââââ¼âââââââ¼âââââââ¼âââââââ¼âââââââââââââ
         â      â      â      â
      Reuse connections, avoid handshake cost

2. Preconnect/Prefetch
   <link rel="preconnect" href="https://api.example.com">
   <link rel="dns-prefetch" href="https://cdn.example.com">

   Triggers early connection establishment

3. Connection Coalescing (HTTP/2)
   Multiple domains â single connection
   (When sharing same IP and certificate)

Application Latency Optimization

Caching Strategies

Caching Layers:

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â                    Caching Hierarchy                         â
â                                                              â
â  Browser âââº CDN Edge âââº App Cache âââº DB Cache âââº DB     â
â    1ms        10ms          20ms         50ms       100ms   â
â                                                              â
â  Each layer should catch most requests before next layer    â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Cache Type Selection:
ââââââââââââââââââââ¬ââââââââââââââââââ¬âââââââââââââââââââââââââ
â Data Type        â Cache Location  â TTL Strategy           â
ââââââââââââââââââââ¼ââââââââââââââââââ¼âââââââââââââââââââââââââ¤
â Static assets    â CDN + Browser   â Long (1 year), hashed  â
â API responses    â CDN + App       â Short (seconds-mins)   â
â Session data     â App (Redis)     â Session duration       â
â DB query results â App (local/dist)â Varies by query        â
â Computed results â App             â Based on input stalenessâ
ââââââââââââââââââââ´ââââââââââââââââââ´âââââââââââââââââââââââââ

Async Processing

Async Patterns for Latency:

1. Background Processing
   Request âââº Validate âââº Queue âââº Response (fast)
                             â
                             ââââº Worker (async processing)

   User sees fast response, heavy work happens later

2. Parallel Requests
   Sequential:
   A(100ms) â B(100ms) â C(100ms) = 300ms

   Parallel:
   A(100ms) ââ
   B(100ms) ââ¼âââº 100ms total
   C(100ms) ââ

3. Speculative Execution
   Start likely-needed work before confirmed
   Cancel if not needed
   Risk: Wasted resources if prediction wrong

4. Read-Your-Writes with Async
   Write âââº Queue âââº Response + Local Cache Update
                         â
         User sees their write immediately
         Backend processes asynchronously

Serialization Optimization

Serialization Format Comparison:

Format        Encode    Decode    Size      Human
              Speed     Speed     (relative) Readable
âââââââââââââââââââââââââââââââââââââââââââââââââââââ
JSON          Fast      Fast      Large     Yes
MessagePack   V.Fast    V.Fast    Small     No
Protocol Buf  Fast      V.Fast    V.Small   No
FlatBuffers   Zero-copy V.Fast    Small     No
Avro          Fast      Fast      Small     Schema

Recommendations:
- Internal services: Protocol Buffers or MessagePack
- Public APIs: JSON (compatibility) or gRPC (performance)
- High-throughput: FlatBuffers (zero-copy)
- Schema evolution: Avro or Protocol Buffers

Optimization Tips:
1. Avoid serializing unnecessary fields
2. Use streaming for large payloads
3. Compress large responses (gzip/brotli)
4. Consider binary formats for internal traffic

Database Latency Optimization

Query Optimization

Database Latency Patterns:

1. Index Optimization
   â Full table scan: O(n) - slow
   â Index lookup: O(log n) - fast
   â Covering index: No table lookup needed

   Monitor: Slow query logs, EXPLAIN plans

2. Query Patterns
   â N+1 queries: 1 + N roundtrips
   â Batch queries: 1 roundtrip
   â JOINs (when appropriate): 1 roundtrip

   Example:
   â for user in users: get_orders(user.id)  # N queries
   â get_orders_for_users(user_ids)           # 1 query

3. Connection Management
   âââ Connection pooling (avoid connection overhead)
   âââ Prepared statements (avoid parsing overhead)
   âââ Connection proximity (same region as app)

4. Read Replicas
   âââââââââââââââââââââââââââââââââââââââââââ
   â  Writes âââº Primary                     â
   â  Reads  âââº Read Replica (lower latency)â
   âââââââââââââââââââââââââââââââââââââââââââ

Database Proximity

Database Placement Strategies:

1. Co-located Database
   App and DB in same availability zone
   Latency: <1ms
   Best for: Primary workloads

2. Same-Region Replica
   Read replica in same region
   Latency: 1-5ms
   Best for: Read scaling

3. Cross-Region Replica
   Replica in user's region
   Latency: Local (~5ms) vs cross-region (~100ms)
   Best for: Global read-heavy apps

4. Globally Distributed
   Database spans regions (CockroachDB, Spanner)
   Write latency: Higher (consensus)
   Read latency: Local
   Best for: Global consistency requirements

Measurement and Monitoring

Latency Measurement

Measurement Points:

1. Client-Side (Real User Monitoring)
   âââ Measures actual user experience
   âââ Includes network variability
   âââ Tools: Browser timing API, RUM services

2. Edge/CDN Metrics
   âââ Time to first byte (TTFB)
   âââ Cache hit ratio
   âââ Origin fetch time

3. Server-Side (APM)
   âââ Request processing time
   âââ Downstream service calls
   âââ Database query time
   âââ Tools: OpenTelemetry, APM vendors

4. Synthetic Monitoring
   âââ Consistent measurement conditions
   âââ Multiple geographic locations
   âââ Baseline for comparison

Distributed Tracing:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â  Request âââº Gateway âââº Service A âââº Service B âââº DB    â
â    â           â            â            â           â      â
â    âââââââââââââ´âââââââââââââ´âââââââââââââ´ââââââââââââ      â
â              Trace ID links all spans together              â
â              Each span has start time + duration            â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Latency SLOs

Setting Latency SLOs:

1. Define Meaningful Metrics
   - P50: Typical experience
   - P95: Most users' worst case
   - P99: Tail latency for critical paths

2. Set Realistic Targets
   P50: 50ms    (snappy feel)
   P95: 200ms   (acceptable)
   P99: 500ms   (degraded but functional)

3. Error Budget Approach
   If target is P99 < 500ms with 99.9% SLO:
   - Budget: 0.1% of requests can exceed 500ms
   - ~43 minutes per month of violations allowed

4. Alert Thresholds
   âââ Warning: P99 > 400ms (80% of budget)
   âââ Critical: P99 > 500ms (at budget)
   âââ Page: P99 > 600ms for 5 minutes (over budget)

Common Anti-Patterns

Latency Anti-Patterns:

1. "Chattiness"
   â Many small requests instead of batched
   â Batch requests, use GraphQL, aggregate APIs

2. "Synchronous Chains"
   â A â B â C â D (sequential)
   â Parallelize independent calls, use async

3. "Unbounded Queries"
   â SELECT * without limits or pagination
   â Always paginate, limit result sets

4. "Cache Miss Storms"
   â Cache expires, all requests hit origin
   â Staggered TTLs, request coalescing, warm cache

5. "Logging in Hot Path"
   â Synchronous logging on every request
   â Async logging, sampling for high volume

6. "Premature Serialization"
   â Serialize before knowing if needed
   â Lazy serialization, stream when possible

7. "Ignoring Tail Latency"
   â Only monitoring averages
   â Track P95, P99, P99.9 for user experience

Best Practices

Latency Optimization Best Practices:

1. Measure First
   â¡ Establish baseline measurements
   â¡ Identify bottlenecks before optimizing
   â¡ Use distributed tracing
   â¡ Monitor percentiles, not just averages

2. Optimize Strategically
   â¡ Start with biggest bottlenecks
   â¡ Apply latency budgets
   â¡ Consider cost vs benefit
   â¡ Test optimizations under load

3. Network Layer
   â¡ Deploy close to users (CDN, edge)
   â¡ Use modern protocols (HTTP/2, HTTP/3)
   â¡ Optimize TLS (1.3, session resumption)
   â¡ Connection pooling and keep-alive

4. Application Layer
   â¡ Cache aggressively and appropriately
   â¡ Parallelize independent operations
   â¡ Use async processing for non-critical work
   â¡ Optimize serialization formats

5. Data Layer
   â¡ Index frequently queried columns
   â¡ Use read replicas for read-heavy loads
   â¡ Connection pooling
   â¡ Query optimization (avoid N+1)

6. Continuous Improvement
   â¡ Regular latency reviews
   â¡ Load testing with latency assertions
   â¡ Automated regression detection
   â¡ User experience correlation

Related Skills

caching-strategies – Application-level caching patterns
multi-region-deployment – Geographic distribution
cdn-architecture – Edge caching and delivery
distributed-tracing – End-to-end latency visibility

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台