Home DevOps Article

Clock Synchronization: The Distributed Systems Problem Engineers Lose Sleep Over

TL;DR

Deep dive into why keeping clocks synchronized across distributed systems is fundamentally hard, covering NTP, PTP, logical clocks, and Google's TrueTime approach.

Key Points

  • Quartz crystal drift causes ~110 seconds/year deviation per 10°C temperature change in standard computer clocks
  • NTP achieves tens of milliseconds accuracy over internet, sub-millisecond on LANs; insufficient for microsecond-precision domains like HFT
  • Google's TrueTime returns time intervals with bounded uncertainty rather than single timestamps, enabling strong consistency in Spanner
  • CockroachDB uses hybrid logical clocks (HLC) on commodity hardware with configurable 500ms max clock offset for consistency safety

Why It Matters

Clock synchronization underpins critical distributed systems—from database consistency to financial transactions to debugging traces. Engineers must understand the tradeoffs between accuracy, latency, and complexity when choosing between NTP, PTP, logical clocks, or custom solutions like TrueTime. Wrong choices silently break consistency guarantees or add unacceptable latency.
Read the full technical deep-dive

Source: arpitbhayani.me