Alex Merced • 4/29/2026

Partitioning, Sharding, and Data Distribution Strategies

This article is Part 8 of a 10-part series on query engine design, covering how engines divide data across files, disks, or cluster nodes to enable parallel processing and reduce query scan volume. It details hash partitioning (even distribution, good for point lookups but poor for range scans), range partitioning (fast range scans but prone to data skew and write hotspots), list partitioning, and techniques like partition pruning, bucketing, and clustering. It also addresses the data skew problem and real-world system implementations (e.g., CockroachDB, Cassandra, Spark). The content is technical, focused on database internals, software engineering, and system design.

0 comments

#Partition Pruning #Hash Partitioning #Range Partitioning