Home Open Source Article

Git-For-Data Tools Compared: LakeFS, Dolt, Nessie Lead Pack

TL;DR

Comprehensive analysis of six Git-like data versioning tools reveals different architectural approaches to branch, merge, and version data without copying petabytes.

Key Points

  • Dolt achieves ~30 min median PR merge times; lakeFS leads with 178 total PR creators
  • LakeFS versions data files via metadata layer over S3/GCS; Nessie versions catalog metadata for Iceberg/Delta tables
  • Dolt enables cell-level audit trails and MySQL-compatible Git semantics with Prolly Trees; DoltgreSQL reaches Beta in 2025
  • Neon implements copy-on-write storage-level branching with zero data copying but lacks merge support

Why It Matters

Data engineers now have production-ready options for Git workflows on petabyte-scale datasets without duplicating data. Understanding these trade-offs—between instant branching, merge capabilities, and compute costs—is critical for choosing the right versioning strategy for data lakes, lakehouses, and databases in modern data stacks.
Dolt GitHub repository

Source: motherduck.com