Alex Merced

Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.

https://tuts.alexmercedcoder.dev

RSS Feed

12/31/2025

data engineering apache iceberg data lakehouse python analytics

Articles from this Blog

501 articles from this blog

9/23/2025 • EN

The 2025 & 2026 Ultimate Guide to the Data Lakehouse and the Data Lakehouse Ecosystem

A comprehensive guide to the data lakehouse architecture, its core components (Iceberg, Delta, Hudi, Paimon), and the surrounding ecosystem for modern data platforms.

Data Architecture Apache Iceberg Data Lakehouse

9/16/2025 • EN

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

A guide to building an autonomous, self-healing optimization pipeline for Apache Iceberg tables to maintain performance and cost efficiency.

Metadata Management Apache Iceberg Data Lakehouse

9/9/2025 • EN

Managing Large-Scale Optimizations — Parallelism, Checkpointing, and Fail Recovery

Strategies for scaling and optimizing Apache Iceberg data compaction jobs, including parallelism, checkpointing, and failure recovery.

parallelism Checkpointing Apache Iceberg

9/2/2025 • EN

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

Explores challenges and best practices for managing partition evolution and compaction in Apache Iceberg to maintain query performance.

Metadata Management Apache Iceberg Data Lakehouse

8/26/2025 • EN

Using Iceberg Metadata Tables to Determine When Compaction Is Needed

Explains how to use Apache Iceberg's metadata tables to dynamically trigger data compaction based on file size, manifest health, and snapshot patterns.

Apache Iceberg Data Lakehouse Table Optimization

8/19/2025 • EN

Designing the Ideal Cadence for Compaction and Snapshot Expiration

A guide to scheduling compaction and snapshot expiration in Apache Iceberg tables based on workload patterns and infrastructure constraints.

Data Engineering Apache Iceberg Data Lakehouse

8/12/2025 • EN

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Explains how to manage Apache Iceberg table metadata by expiring old snapshots and rewriting manifests to prevent performance and cost issues.

Metadata Management Apache Iceberg Data Lakehouse

8/5/2025 • EN

Smarter Data Layout — Sorting and Clustering Iceberg Tables

Explains how to use sorting and Z-order clustering in Apache Iceberg tables to optimize query performance and data layout.

Sorting Clustering Apache Iceberg

7/29/2025 • EN

Optimizing Compaction for Streaming Workloads in Apache Iceberg

Explains techniques for incremental, non-disruptive compaction in Apache Iceberg tables under continuous streaming data ingestion.

Apache Iceberg Data Lakehouse Data Compaction

7/22/2025 • EN

The Basics of Compaction — Bin Packing Your Data for Efficiency

Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.

Spark Apache Iceberg Data Compaction

7/15/2025 • EN

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.

Metadata Management Data Engineering Apache Iceberg

7/3/2025 • EN

How to Discover or Organize Lakehouse & Apache Iceberg Meetups

A guide on how to find, join, and organize community meetups focused on Apache Iceberg and modern data lakehouse architectures.

Slack Meetup Organization Apache Iceberg

5/2/2025 • EN

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.

data transformation Data Pipelines Etl

5/2/2025 • EN

Introduction to Data Engineering Concepts | DevOps for Data Engineering

Explores how DevOps principles like CI/CD, infrastructure as code, and monitoring are applied to data engineering for reliable, scalable data pipelines.

DevOps version control Data Pipelines

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Warehousing Fundamentals

An introduction to data warehousing concepts, covering architecture, components, and performance optimization for analytical workloads.

performance optimization Data Engineering Data Architecture

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Quality and Validation

Explores the importance of data quality and validation in data engineering, covering key dimensions and tools for reliable pipelines.

Data Quality Data Pipelines Data Validation

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Modeling Basics

An introduction to data modeling concepts, covering OLTP vs OLAP systems, normalization, and common schema designs for data engineering.

Data Modeling Database Design Data Engineering

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Lakes Explained

Explains data lakes, their key characteristics, and how they differ from data warehouses in modern data architecture.

cloud storage Data Engineering Data Architecture

5/2/2025 • EN

Introduction to Data Engineering Concepts | Building Scalable Pipelines

Explores core principles of scalable data engineering, including parallelism, minimizing data movement, and designing adaptable pipelines for growing data volumes.

parallelism Data Engineering Data Architecture