lamport-distributed-systems

📁 copyleftdev/sk1llz 📅 12 days ago

总安装量

周安装量

#48179

全站排名

安装命令

npx skills add https://github.com/copyleftdev/sk1llz --skill lamport-distributed-systems

Agent 安装分布

openclaw 2

github-copilot 2

codex 2

kimi-cli 2

gemini-cli 2

cursor 2

Skill 文档

Leslie Lamport Style Guide

Overview

Leslie Lamport transformed distributed systems from ad-hoc engineering into a rigorous science. His work on logical clocks, consensus (Paxos), and formal specification (TLA+) provides the theoretical foundation for nearly every reliable distributed system built today. Turing Award winner (2013).

Core Philosophy

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”

“If you’re thinking without writing, you only think you’re thinking.”

“The way to be a good programmer is to write programs, not to learn languages.”

Design Principles

Formal Specification First: Write a precise specification before writing code. If you can’t specify it precisely, you don’t understand it.
Time is Relative: There is no global clock in a distributed system. Use logical time (happens-before) to reason about ordering.
State Machine Replication: Any deterministic service can be made fault-tolerant by replicating it as a state machine across multiple servers.
Safety and Liveness: Separate what must always be true (safety) from what must eventually happen (liveness). Prove both.
Simplicity Through Rigor: The clearest systems come from precise thinking. Formalism isn’t overheadâit’s the path to simplicity.

When Designing Systems

Always

Write a specification before implementation (TLA+, Alloy, or precise prose)
Define the safety properties: what bad things must never happen
Define the liveness properties: what good things must eventually happen
Reason about all possible interleavings of concurrent operations
Use logical timestamps when physical time isn’t reliable
Make system state explicit and transitions clear
Document invariants that must hold across all states

Never

Assume messages arrive in order (unless you’ve proven it)
Assume clocks are synchronized (they’re not)
Assume failures are independent (they’re often correlated)
Hand-wave about “eventually” without defining what guarantees that
Trust intuition for concurrent systemsâprove it or test it exhaustively
Confuse the specification with the implementation

Prefer

State machines over ad-hoc event handling
Logical clocks over physical timestamps for ordering
Consensus protocols over optimistic concurrency for critical state
Explicit failure handling over implicit assumptions
Proved algorithms over clever heuristics

Key Concepts

Logical Clocks (Lamport Timestamps)

Each process maintains a counter C:
1. Before any event, increment C
2. When sending a message, include C
3. When receiving a message with timestamp T, set C = max(C, T) + 1

This gives a partial ordering: if a â b, then C(a) < C(b)
(But C(a) < C(b) does NOT imply a â b)

The Happens-Before Relation (â)

a â b (a happens before b) if:
1. a and b are in the same process and a comes before b, OR
2. a is a send and b is the corresponding receive, OR
3. There exists c such that a â c and c â b (transitivity)

If neither a â b nor b â a, events are CONCURRENT.

State Machine Replication

To replicate a service:
1. Model the service as a deterministic state machine
2. Replicate the state machine across N servers
3. Use consensus (Paxos/Raft) to agree on the sequence of inputs
4. Each replica applies inputs in the same order â same state

Tolerates F failures with 2F+1 replicas.

Paxos (Simplified)

Three roles: Proposers, Acceptors, Learners

Phase 1 (Prepare):
  Proposer sends PREPARE(n) to acceptors
  Acceptor responds with highest-numbered proposal it accepted (if any)
  
Phase 2 (Accept):
  If proposer receives majority responses:
    Send ACCEPT(n, v) where v is highest-numbered value seen (or new value)
  Acceptor accepts if it hasn't promised to a higher proposal

Consensus reached when majority accept the same (n, v).

Mental Model

Lamport approaches distributed systems as a mathematician:

Define the problem precisely: What are the inputs, outputs, and allowed behaviors?
Identify the invariants: What must always be true?
Consider all interleavings: What happens if events occur in any order?
Prove correctness: Show that safety and liveness hold.
Then implement: The code should be a straightforward translation of the spec.

The TLA+ Approach

1. Define the state space (all possible states)
2. Define the initial state predicate
3. Define the next-state relation (allowed transitions)
4. Specify safety as invariants (always true)
5. Specify liveness as temporal properties (eventually true)
6. Model-check or prove that the spec satisfies properties

Code Patterns

Implementing Logical Clocks

class LamportClock:
    def __init__(self):
        self._time = 0
    
    def tick(self) -> int:
        """Increment before local event."""
        self._time += 1
        return self._time
    
    def send_timestamp(self) -> int:
        """Get timestamp for outgoing message."""
        self._time += 1
        return self._time
    
    def receive(self, msg_timestamp: int) -> int:
        """Update clock on message receipt."""
        self._time = max(self._time, msg_timestamp) + 1
        return self._time

Vector Clocks (for Causality Detection)

class VectorClock:
    def __init__(self, node_id: str, num_nodes: int):
        self._id = node_id
        self._clock = {f"node_{i}": 0 for i in range(num_nodes)}
    
    def tick(self):
        self._clock[self._id] += 1
    
    def send(self) -> dict:
        self.tick()
        return self._clock.copy()
    
    def receive(self, other: dict):
        for node, time in other.items():
            self._clock[node] = max(self._clock.get(node, 0), time)
        self.tick()
    
    def happens_before(self, other: dict) -> bool:
        """Returns True if self â other."""
        return all(self._clock[k] <= other.get(k, 0) for k in self._clock) \
           and any(self._clock[k] < other.get(k, 0) for k in self._clock)

Warning Signs

You’re violating Lamport’s principles if:

You assume “this will never happen in practice” without proof
Your distributed algorithm works “most of the time”
You can’t write down the invariants your system maintains
You’re using wall-clock time for ordering in a distributed system
You haven’t considered what happens during network partitions
Your system has no formal specification

Additional Resources

For detailed philosophy, see philosophy.md
For references (papers, talks), see references.md

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台