How Databases Organize Data on Disk: Pages, Blocks, and File Formats
Read OriginalThis article is Part 3 of a 10-part series on query engine design, diving into how databases physically structure data within files. It covers three main data organization methods: heap files (fast writes, slow reads), sorted files (fast reads, slow writes), and LSM trees (a write-optimized compromise). It also discusses open file formats such as Apache Parquet, ORC, and Avro, along with metadata techniques like column statistics, bloom filters, and partition metadata that enable efficient data skipping. The article highlights the tradeoff between write-time and read-time work, making it a technical deep dive into storage internals for IT and technology professionals.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser
Top of the Week
No top articles yet