websocket-engineer
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill websocket-engineer
Agent 安装分布
Skill 文档
WebSocket & Real-Time Engineer
Purpose
Provides real-time communication expertise specializing in WebSocket architecture, Socket.IO, and event-driven systems. Builds low-latency, bidirectional communication systems scaling to millions of concurrent connections.
When to Use
- Building chat apps, live dashboards, or multiplayer games
- Scaling WebSocket servers horizontally (Redis Adapter)
- Implementing “Server-Sent Events” (SSE) for one-way updates
- Troubleshooting connection drops, heartbeat failures, or CORS issues
- Designing stateful connection architectures
- Migrating from polling to push technology
Examples
Example 1: Real-Time Chat Application
Scenario: Building a scalable chat platform for enterprise use.
Implementation:
- Designed WebSocket architecture with Socket.IO
- Implemented Redis Adapter for horizontal scaling
- Created room-based message routing
- Added message persistence and history
- Implemented presence system (online/offline)
Results:
- Supports 100,000+ concurrent connections
- 50ms average message delivery
- 99.99% connection stability
- Seamless horizontal scaling
Example 2: Live Dashboard System
Scenario: Real-time analytics dashboard with sub-second updates.
Implementation:
- Implemented WebSocket server with low latency
- Created efficient message batching strategy
- Added Redis pub/sub for multi-server support
- Implemented client-side update coalescing
- Added compression for large payloads
Results:
- Dashboard updates in under 100ms
- Handles 10,000 concurrent dashboard views
- 80% reduction in server load vs polling
- Zero data loss during reconnections
Example 3: Multiplayer Game Backend
Scenario: Low-latency multiplayer game server.
Implementation:
- Implemented WebSocket server with binary protocols
- Created authoritative server architecture
- Added client-side prediction and reconciliation
- Implemented lag compensation algorithms
- Set up server-side physics and collision detection
Results:
- 30ms end-to-end latency
- Supports 1000 concurrent players per server
- Smooth gameplay despite network variations
- Cheat-resistant server authority
Best Practices
Connection Management
- Heartbeats: Implement ping/pong for connection health
- Reconnection: Automatic reconnection with backoff
- State Cleanup: Proper cleanup on disconnect
- Connection Limits: Prevent resource exhaustion
Scaling
- Horizontal Scaling: Use Redis Adapter for multi-server
- Sticky Sessions: Proper load balancer configuration
- Message Routing: Efficient routing for broadcast/unicast
- Rate Limiting: Prevent abuse and overload
Performance
- Message Batching: Batch messages where appropriate
- Compression: Compress messages (permessage-deflate)
- Binary Protocols: Use binary for performance-critical data
- Connection Pooling: Efficient client connection reuse
Security
- Authentication: Validate on handshake
- TLS: Always use WSS
- Input Validation: Validate all incoming messages
- Rate Limiting: Limit connection/message rates
2. Decision Framework
Protocol Selection
What is the communication pattern?
â
ââ **Bi-directional (Chat/Game)**
â ââ Low Latency needed? â **WebSockets (Raw)**
â ââ Fallbacks/Auto-reconnect needed? â **Socket.IO**
â ââ P2P Video/Audio? â **WebRTC**
â
ââ **One-way (Server â Client)**
â ââ Stock Ticker / Notifications? â **Server-Sent Events (SSE)**
â ââ Large File Download? â **HTTP Stream**
â
ââ **High Frequency (IoT)**
ââ Constrained device? â **MQTT** (over TCP/WS)
Scaling Strategy
| Scale | Architecture | Backend |
|---|---|---|
| < 10k Users | Monolith Node.js | Single Instance |
| 10k – 100k | Clustering | Node.js Cluster + Redis Adapter |
| 100k – 1M | Microservices | Go/Elixir/Rust + NATS/Kafka |
| Global | Edge | Cloudflare Workers / PubNub / Pusher |
Load Balancer Config
- Sticky Sessions: REQUIRED for Socket.IO (handshake phase).
- Timeouts: Increase idle timeouts (e.g., 60s+).
- Headers:
Upgrade: websocket,Connection: Upgrade.
Red Flags â Escalate to security-engineer:
- Accepting connections from any Origin (
*) with credentials - No Rate Limiting on connection requests (DoS risk)
- Sending JWTs in URL query params (Logged in proxy logs) – Use Cookie or Initial Message instead
3. Core Workflows
Workflow 1: Scalable Socket.IO Server (Node.js)
Goal: Chat server capable of scaling across multiple cores/instances.
Steps:
-
Install Dependencies
npm install socket.io redis @socket.io/redis-adapter -
Implementation (
server.js)const { Server } = require("socket.io"); const { createClient } = require("redis"); const { createAdapter } = require("@socket.io/redis-adapter"); const pubClient = createClient({ url: "redis://localhost:6379" }); const subClient = pubClient.duplicate(); Promise.all([pubClient.connect(), subClient.connect()]).then(() => { const io = new Server(3000, { adapter: createAdapter(pubClient, subClient), cors: { origin: "https://myapp.com", methods: ["GET", "POST"] } }); io.on("connection", (socket) => { // User joins a room (e.g., "chat-123") socket.on("join", (room) => { socket.join(room); }); // Send message to room (propagates via Redis to all nodes) socket.on("message", (data) => { io.to(data.room).emit("chat", data.text); }); }); });
Workflow 3: Production Tuning (Linux)
Goal: Handle 50k concurrent connections on a single server.
Steps:
-
File Descriptors
- Increase limit:
ulimit -n 65535. - Edit
/etc/security/limits.conf.
- Increase limit:
-
Ephemeral Ports
- Increase range:
sysctl -w net.ipv4.ip_local_port_range="1024 65535".
- Increase range:
-
Memory Optimization
- Use
ws(lighter) instead of Socket.IO if features not needed. - Disable “Per-Message Deflate” (Compression) if CPU is high.
- Use
5. Anti-Patterns & Gotchas
â Anti-Pattern 1: Stateful Monolith
What it looks like:
- Storing
users = []array in Node.js memory.
Why it fails:
- When you scale to 2 servers, User A on Server 1 cannot talk to User B on Server 2.
- Memory leaks crash the process.
Correct approach:
- Use Redis as the state store (Adapter).
- Stateless servers, Stateful backend (Redis).
â Anti-Pattern 2: The “Thundering Herd”
What it looks like:
- Server restarts. 100,000 clients reconnect instantly.
- Server crashes again due to CPU spike.
Why it fails:
- Connection handshakes are expensive (TLS + Auth).
Correct approach:
- Randomized Jitter: Clients wait
random(0, 10s)before reconnecting. - Exponential Backoff: Wait 1s, then 2s, then 4s…
â Anti-Pattern 3: Blocking the Event Loop
What it looks like:
socket.on('message', () => { heavyCalculation(); })
Why it fails:
- Node.js is single-threaded. One heavy task blocks all 10,000 connections.
Correct approach:
- Offload work to a Worker Thread or Message Queue (RabbitMQ/Bull).
7. Quality Checklist
Scalability:
- Adapter: Redis/NATS adapter configured for multi-node.
- Load Balancer: Sticky sessions enabled (if using polling fallback).
- OS Limits: File descriptors limit increased.
Resilience:
- Reconnection: Exponential backoff + Jitter implemented.
- Heartbeat: Ping/Pong interval configured (< LB timeout).
- Fallback: Socket.IO fallbacks (HTTP Long Polling) enabled/tested.
Security:
- WSS: TLS enabled (Secure WebSockets).
- Auth: Handshake validates credentials properly.
- Rate Limit: Connection rate limiting active.
Anti-Patterns
Connection Management Anti-Patterns
- No Heartbeats: Not detecting dead connections – implement ping/pong
- Memory Leaks: Not cleaning up closed connections – implement proper cleanup
- Infinite Reconnects: Reloop without backoff – implement exponential backoff
- Sticky Sessions Required: Not designing for stateless – use Redis for state
Scaling Anti-Patterns
- Single Server: Not scaling beyond one instance – use Redis adapter
- No Load Balancing: Direct connections to servers – use proper load balancer
- Broadcast Storm: Sending to all connections blindly – target specific connections
- Connection Saturation: Too many connections per server – scale horizontally
Performance Anti-Patterns
- Message Bloat: Large unstructured messages – use efficient message formats
- No Throttling: Unlimited send rates – implement rate limiting
- Blocking Operations: Synchronous processing – use async processing
- No Monitoring: Operating blind – implement connection metrics
Security Anti-Patterns
- No TLS: Using unencrypted connections – always use WSS
- Weak Auth: Simple token validation – implement proper authentication
- No Rate Limits: Vulnerable to abuse – implement connection/message limits
- CORS Exposed: Open cross-origin access – configure proper CORS