Alex Merced 4/29/2026

How Databases Organize Data on Disk: Pages, Blocks, and File Formats

Read Original

This article is Part 3 of a 10-part series on query engine design, diving into how databases physically structure data within files. It covers three main data organization methods: heap files (fast writes, slow reads), sorted files (fast reads, slow writes), and LSM trees (a write-optimized compromise). It also discusses open file formats such as Apache Parquet, ORC, and Avro, along with metadata techniques like column statistics, bloom filters, and partition metadata that enable efficient data skipping. The article highlights the tradeoff between write-time and read-time work, making it a technical deep dive into storage internals for IT and technology professionals.

How Databases Organize Data on Disk: Pages, Blocks, and File Formats

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet