websocket-engineer

📁 404kidwiz/claude-supercode-skills 📅 Jan 24, 2026
37
总安装量
37
周安装量
#5614
全站排名
安装命令
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill websocket-engineer

Agent 安装分布

opencode 26
claude-code 25
codex 25
cursor 17
github-copilot 17

Skill 文档

WebSocket & Real-Time Engineer

Purpose

Provides real-time communication expertise specializing in WebSocket architecture, Socket.IO, and event-driven systems. Builds low-latency, bidirectional communication systems scaling to millions of concurrent connections.

When to Use

  • Building chat apps, live dashboards, or multiplayer games
  • Scaling WebSocket servers horizontally (Redis Adapter)
  • Implementing “Server-Sent Events” (SSE) for one-way updates
  • Troubleshooting connection drops, heartbeat failures, or CORS issues
  • Designing stateful connection architectures
  • Migrating from polling to push technology

Examples

Example 1: Real-Time Chat Application

Scenario: Building a scalable chat platform for enterprise use.

Implementation:

  1. Designed WebSocket architecture with Socket.IO
  2. Implemented Redis Adapter for horizontal scaling
  3. Created room-based message routing
  4. Added message persistence and history
  5. Implemented presence system (online/offline)

Results:

  • Supports 100,000+ concurrent connections
  • 50ms average message delivery
  • 99.99% connection stability
  • Seamless horizontal scaling

Example 2: Live Dashboard System

Scenario: Real-time analytics dashboard with sub-second updates.

Implementation:

  1. Implemented WebSocket server with low latency
  2. Created efficient message batching strategy
  3. Added Redis pub/sub for multi-server support
  4. Implemented client-side update coalescing
  5. Added compression for large payloads

Results:

  • Dashboard updates in under 100ms
  • Handles 10,000 concurrent dashboard views
  • 80% reduction in server load vs polling
  • Zero data loss during reconnections

Example 3: Multiplayer Game Backend

Scenario: Low-latency multiplayer game server.

Implementation:

  1. Implemented WebSocket server with binary protocols
  2. Created authoritative server architecture
  3. Added client-side prediction and reconciliation
  4. Implemented lag compensation algorithms
  5. Set up server-side physics and collision detection

Results:

  • 30ms end-to-end latency
  • Supports 1000 concurrent players per server
  • Smooth gameplay despite network variations
  • Cheat-resistant server authority

Best Practices

Connection Management

  • Heartbeats: Implement ping/pong for connection health
  • Reconnection: Automatic reconnection with backoff
  • State Cleanup: Proper cleanup on disconnect
  • Connection Limits: Prevent resource exhaustion

Scaling

  • Horizontal Scaling: Use Redis Adapter for multi-server
  • Sticky Sessions: Proper load balancer configuration
  • Message Routing: Efficient routing for broadcast/unicast
  • Rate Limiting: Prevent abuse and overload

Performance

  • Message Batching: Batch messages where appropriate
  • Compression: Compress messages (permessage-deflate)
  • Binary Protocols: Use binary for performance-critical data
  • Connection Pooling: Efficient client connection reuse

Security

  • Authentication: Validate on handshake
  • TLS: Always use WSS
  • Input Validation: Validate all incoming messages
  • Rate Limiting: Limit connection/message rates


2. Decision Framework

Protocol Selection

What is the communication pattern?
│
├─ **Bi-directional (Chat/Game)**
│  ├─ Low Latency needed? → **WebSockets (Raw)**
│  ├─ Fallbacks/Auto-reconnect needed? → **Socket.IO**
│  └─ P2P Video/Audio? → **WebRTC**
│
├─ **One-way (Server → Client)**
│  ├─ Stock Ticker / Notifications? → **Server-Sent Events (SSE)**
│  └─ Large File Download? → **HTTP Stream**
│
└─ **High Frequency (IoT)**
   └─ Constrained device? → **MQTT** (over TCP/WS)

Scaling Strategy

Scale Architecture Backend
< 10k Users Monolith Node.js Single Instance
10k – 100k Clustering Node.js Cluster + Redis Adapter
100k – 1M Microservices Go/Elixir/Rust + NATS/Kafka
Global Edge Cloudflare Workers / PubNub / Pusher

Load Balancer Config

  • Sticky Sessions: REQUIRED for Socket.IO (handshake phase).
  • Timeouts: Increase idle timeouts (e.g., 60s+).
  • Headers: Upgrade: websocket, Connection: Upgrade.

Red Flags → Escalate to security-engineer:

  • Accepting connections from any Origin (*) with credentials
  • No Rate Limiting on connection requests (DoS risk)
  • Sending JWTs in URL query params (Logged in proxy logs) – Use Cookie or Initial Message instead


3. Core Workflows

Workflow 1: Scalable Socket.IO Server (Node.js)

Goal: Chat server capable of scaling across multiple cores/instances.

Steps:

  1. Install Dependencies

    npm install socket.io redis @socket.io/redis-adapter
    
  2. Implementation (server.js)

    const { Server } = require("socket.io");
    const { createClient } = require("redis");
    const { createAdapter } = require("@socket.io/redis-adapter");
    
    const pubClient = createClient({ url: "redis://localhost:6379" });
    const subClient = pubClient.duplicate();
    
    Promise.all([pubClient.connect(), subClient.connect()]).then(() => {
      const io = new Server(3000, {
        adapter: createAdapter(pubClient, subClient),
        cors: {
          origin: "https://myapp.com",
          methods: ["GET", "POST"]
        }
      });
    
      io.on("connection", (socket) => {
        // User joins a room (e.g., "chat-123")
        socket.on("join", (room) => {
          socket.join(room);
        });
    
        // Send message to room (propagates via Redis to all nodes)
        socket.on("message", (data) => {
          io.to(data.room).emit("chat", data.text);
        });
      });
    });
    


Workflow 3: Production Tuning (Linux)

Goal: Handle 50k concurrent connections on a single server.

Steps:

  1. File Descriptors

    • Increase limit: ulimit -n 65535.
    • Edit /etc/security/limits.conf.
  2. Ephemeral Ports

    • Increase range: sysctl -w net.ipv4.ip_local_port_range="1024 65535".
  3. Memory Optimization

    • Use ws (lighter) instead of Socket.IO if features not needed.
    • Disable “Per-Message Deflate” (Compression) if CPU is high.


5. Anti-Patterns & Gotchas

❌ Anti-Pattern 1: Stateful Monolith

What it looks like:

  • Storing users = [] array in Node.js memory.

Why it fails:

  • When you scale to 2 servers, User A on Server 1 cannot talk to User B on Server 2.
  • Memory leaks crash the process.

Correct approach:

  • Use Redis as the state store (Adapter).
  • Stateless servers, Stateful backend (Redis).

❌ Anti-Pattern 2: The “Thundering Herd”

What it looks like:

  • Server restarts. 100,000 clients reconnect instantly.
  • Server crashes again due to CPU spike.

Why it fails:

  • Connection handshakes are expensive (TLS + Auth).

Correct approach:

  • Randomized Jitter: Clients wait random(0, 10s) before reconnecting.
  • Exponential Backoff: Wait 1s, then 2s, then 4s…

❌ Anti-Pattern 3: Blocking the Event Loop

What it looks like:

  • socket.on('message', () => { heavyCalculation(); })

Why it fails:

  • Node.js is single-threaded. One heavy task blocks all 10,000 connections.

Correct approach:

  • Offload work to a Worker Thread or Message Queue (RabbitMQ/Bull).


7. Quality Checklist

Scalability:

  • Adapter: Redis/NATS adapter configured for multi-node.
  • Load Balancer: Sticky sessions enabled (if using polling fallback).
  • OS Limits: File descriptors limit increased.

Resilience:

  • Reconnection: Exponential backoff + Jitter implemented.
  • Heartbeat: Ping/Pong interval configured (< LB timeout).
  • Fallback: Socket.IO fallbacks (HTTP Long Polling) enabled/tested.

Security:

  • WSS: TLS enabled (Secure WebSockets).
  • Auth: Handshake validates credentials properly.
  • Rate Limit: Connection rate limiting active.

Anti-Patterns

Connection Management Anti-Patterns

  • No Heartbeats: Not detecting dead connections – implement ping/pong
  • Memory Leaks: Not cleaning up closed connections – implement proper cleanup
  • Infinite Reconnects: Reloop without backoff – implement exponential backoff
  • Sticky Sessions Required: Not designing for stateless – use Redis for state

Scaling Anti-Patterns

  • Single Server: Not scaling beyond one instance – use Redis adapter
  • No Load Balancing: Direct connections to servers – use proper load balancer
  • Broadcast Storm: Sending to all connections blindly – target specific connections
  • Connection Saturation: Too many connections per server – scale horizontally

Performance Anti-Patterns

  • Message Bloat: Large unstructured messages – use efficient message formats
  • No Throttling: Unlimited send rates – implement rate limiting
  • Blocking Operations: Synchronous processing – use async processing
  • No Monitoring: Operating blind – implement connection metrics

Security Anti-Patterns

  • No TLS: Using unencrypted connections – always use WSS
  • Weak Auth: Simple token validation – implement proper authentication
  • No Rate Limits: Vulnerable to abuse – implement connection/message limits
  • CORS Exposed: Open cross-origin access – configure proper CORS