Feldera Open Sources Incremental SQL Query Engine

TL;DR

Feldera, a novel incremental computation engine supporting full SQL syntax, processes millions of events per second while avoiding expensive full dataset recomputation.

Key Points

Evaluates arbitrary SQL incrementally including joins, aggregates, window functions, correlated subqueries, and recursive queries
Achieves millions of events per second on laptop hardware without tuning; handles datasets exceeding RAM via NVMe spilling
Backed by formal DBSP mathematical model (VLDB 2023 paper); guarantees strong consistency matching batch system semantics
Supports Kafka, HTTP, CDC streams, S3, data lakes, and warehouses; includes fault tolerance with exactly-once semantics

Why It Matters

This addresses a fundamental gap in data processing: existing systems force tradeoffs between batch (slow, comprehensive) and streaming (fast, limited SQL). Incremental computation at full SQL expressiveness enables unified pipelines for real-time feature engineering, ETL, and analytics without recomputation overhead—critical for cost-sensitive and latency-sensitive workloads.

View on GitHub

Source: github.com