Understanding Apache Iceberg Delete Files
Explains how Apache Iceberg uses delete files for efficient row-level data deletions without rewriting entire datasets.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
418 articles from this blog
Explains how Apache Iceberg uses delete files for efficient row-level data deletions without rewriting entire datasets.
Explains the role and structure of Apache Iceberg manifest files, key metadata components for tracking data files and optimizing queries in data lakehouses.
Explains the role and structure of the Apache Iceberg Manifest List file in managing table snapshots and optimizing data lakehouse queries.
Explains the critical role and structure of the metadata.json file in Apache Iceberg, the open-source table format for data lakehouses.
Explains the purpose and limitations of the Apache Iceberg REST Catalog specification for standardizing table operations.
Explains how Apache Iceberg brings ACID transaction guarantees to data lakes, enabling reliable data operations on open storage.
An introduction to data lakehouses, explaining what they are, why they're used, and how to migrate to this modern data architecture.
Explores Polaris, an open-source catalog service for managing Apache Iceberg tables in data lakehouses, covering its architecture, entities, and security.
Explains how Apache Iceberg's design ensures data reliability, atomic operations, and serializable isolation for large-scale data lakehouses.
A list of upcoming tech talks and events by Alex Merced, focusing on Apache Iceberg, data lakehouses, and data engineering topics.
Explains the data lakehouse architecture, its layers (storage, table format, catalog, processing), and its advantages over traditional data warehouses.
A video course covering the fundamentals of lakehouse engineering using Apache Iceberg, Nessie, and Dremio for data management.
An introduction to common sorting algorithms like Bubble Sort, Merge Sort, and Quick Sort, implemented and explained in JavaScript.
Explores Apache Iceberg's advanced partitioning features, including hidden partitioning and transformations, for optimizing query performance in data lakes.
Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.
A tutorial on using Dremio and Docker to run SQL queries directly on Excel files from your local machine.
A comprehensive guide to functional programming concepts in JavaScript, including pure functions, immutability, currying, memoization, and monads.
An introduction to Apache Iceberg, a table format for data lakehouses, explaining its architecture and providing learning resources.
Explores the evolution of Apache Iceberg catalogs, focusing on the current REST Catalog and future proposals for server-side optimizations.
A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.