Apache Iceberg articles

7/6/2026 • EN

Lakehouse Table Formats in 2026: Iceberg, Delta Lake, Hudi, Paimon, and DuckLake, How They Work, Where They Stand, and Where They're Going

Analysis of five lakehouse table formats (Iceberg, Delta Lake, Hudi, Paimon, DuckLake) in 2026: how they work, current status, and future directions.

Apache Hudi Apache Iceberg Apache Paimon Delta Lake Table Formats

Alex Merced

7/6/2026 • EN

File Encryption for the Lakehouse: The Terminology, the Machinery, and the Hard Problem of Interoperable Encrypted Tables

Explains file encryption for lakehouses, covering Parquet Modular Encryption, Iceberg table encryption, and interoperability challenges.

Apache Iceberg Iceberg Encryption Interoperable Encryption Lakehouse Security Parquet Encryption

Alex Merced

7/6/2026 • EN

Multi-Engine Catalog Federation with Apache Polaris: Syncing Google Cloud, AWS, and Azure Metadata

Explores multi-engine catalog federation using Apache Polaris to sync metadata across Google Cloud, AWS, and Azure for open lakehouse governance.

Apache Iceberg Apache Polaris Catalog Federation Data Lakehouse Multicloud

Alex Merced

7/6/2026 • EN

Migrating Proprietary Warehouses to Open Lakehouses: The 2026 Playbook for Zero-Copy Metadata Translation

A 2026 playbook for migrating proprietary data warehouses to open lakehouses using zero-copy metadata translation and staged modernization.

Apache Iceberg Data Migration Lakehouse sql Zero Copy

Alex Merced

7/6/2026 • EN

Mapping the Variant Type in Iceberg v3: Standardizing Semi-Structured AI JSON Payloads

Explores Iceberg v3's variant type for standardizing semi-structured AI JSON payloads in lakehouse architectures.

AI JSON Payloads Apache Iceberg Lakehouse Semi Structured Data Variant Type

Alex Merced

7/6/2026 • EN

Implementing Positional Deletes in Iceberg v3: Streamlining Merge-on-Read for Fast-Inbound Event Lakes

Explains how Iceberg v3's positional deletes and merge-on-read improve event lake performance for fast-inbound data corrections.

Apache Iceberg Event Lakes Lakehouse Merge On Read Positional Deletes

Alex Merced

7/6/2026 • EN

Designing Private, Air-Gapped Data Lakehouses: Scaling Iceberg in Highly Secure, On-Premises Clouds

Designing secure, air-gapped data lakehouses using Apache Iceberg for defense, healthcare, finance, and other high-security sectors.

Air Gapped Lakehouse Apache Iceberg Data Lakehouse Private Cloud Secure Analytics

Alex Merced

7/6/2026 • EN

Designing Idempotent Pipelines in the Agentic Lakehouse: Eliminating Double-Write Anomalies

Explains designing idempotent pipelines in agentic lakehouses to prevent double-write anomalies using Iceberg and workflow safeguards.

Agentic Lakehouse Apache Iceberg Double Write Anomalies Idempotent Pipelines Workflow Idempotency

Alex Merced

7/6/2026 • EN

Decoupled Catalogs vs. Managed Tables: Architectural Freedom in the Age of Table Format Convergence

Explores the trade-offs between decoupled catalogs and managed tables in open table formats like Apache Iceberg, focusing on architectural freedom and operational simplicity.

Apache Iceberg Data Architecture Decoupled Catalogs Managed Tables Open Table Formats

Alex Merced

7/6/2026 • EN

The State of Streaming to Apache Iceberg in July 2026: Every Path, Its Latency, and What to Do When Seconds Are Not Fast Enough

A comprehensive guide to streaming data into Apache Iceberg tables in 2026, covering latency, tools, and architectures for sub-second freshness.

Apache Flink Apache Iceberg Data Latency Spark Structured Streaming streaming

Alex Merced

7/6/2026 • EN

The State of Apache Iceberg v4 in July 2026: What the Dev List Tells Us About the Format's Next Chapter

Analysis of Apache Iceberg v4's development status in July 2026 based on the dev mailing list, covering ratified specs, debates, and practical advice.

Apache Iceberg Column Updates Dev Mailing List Iceberg V4 metadata

Alex Merced

6/22/2026 • EN

The Real-Time Lakehouse with Streaming and Iceberg

Explores the real-time lakehouse architecture, focusing on streaming, Iceberg snapshots, and the dual clocks of event arrival and query visibility.

Apache Flink Apache Iceberg Data Engineering real-time streaming

Alex Merced

6/22/2026 • EN

Event-Driven Table Compaction with Agents

Explores event-driven table compaction using agents in lakehouse architectures, focusing on small file problems and production patterns.

Agents Apache Iceberg Event Driven Architecture Lakehouse Table Compaction

Alex Merced

6/22/2026 • EN

What Is LTAP in the Lakehouse?

Analysis of LTAP (Lakehouse Transactional Analytical Processing) focusing on freshness, isolation, and workload boundaries for data architects.

Apache Flink Apache Iceberg Data Architecture Lakehouse Ltap

Alex Merced

6/22/2026 • EN

PyIceberg at Scale Without Apache Spark

Explores using PyIceberg without Apache Spark for Python-based Iceberg table operations, focusing on architecture, boundaries, and production patterns.

Apache Iceberg Data Engineering Lakehouse Pyiceberg serverless

Alex Merced

6/22/2026 • EN

Server-Side Commit Deconflicting in REST Catalogs

Explores server-side commit deconflicting in REST catalogs for high-concurrency lakehouse writes, focusing on architecture, specs, and operational patterns.

Apache Iceberg Concurrency Control Lakehouse Writes REST Catalogs Server Side Commit Deconflicting

Alex Merced

6/22/2026 • EN

Snowflake Interoperable Lakehouse Lessons

Analysis of Snowflake interoperable lakehouse lessons focusing on production contracts, multi-engine access, and agentic analytics challenges.

Apache Iceberg Data Contracts Interoperability Lakehouse Snowflake

Alex Merced

6/8/2026 • EN

Goal-Directed Analytics Agents on Apache Iceberg

Architecture of goal-directed analytics agents using Apache Iceberg for durable state and dynamic task decomposition.

Action Loop Analytics Agents Apache Iceberg Goal Directed Agents SQL Queries

Alex Merced

6/8/2026 • EN

Modern Python Tooling for Apache Iceberg

Overview of modern Python tools for Apache Iceberg, including PyIceberg, IceFrame, and CLI for metadata management.

Apache Iceberg Data Engineering Data Lakehouse Pyiceberg Python

Alex Merced

5/28/2026 • EN

Decoupling Storage and Compute in Apache Iceberg: A Deep Dive into Cost Optimization

Explores how Apache Iceberg decouples storage and compute for cost optimization, including multi-engine routing and TCO analysis.

Apache Iceberg cost optimization Data Lakehouse Multi Engine Query Storage Compute Decoupling

Alex Merced

Apache Iceberg Articles

Lakehouse Table Formats in 2026: Iceberg, Delta Lake, Hudi, Paimon, and DuckLake, How They Work, Where They Stand, and Where They're Going

File Encryption for the Lakehouse: The Terminology, the Machinery, and the Hard Problem of Interoperable Encrypted Tables

Multi-Engine Catalog Federation with Apache Polaris: Syncing Google Cloud, AWS, and Azure Metadata

Migrating Proprietary Warehouses to Open Lakehouses: The 2026 Playbook for Zero-Copy Metadata Translation

Mapping the Variant Type in Iceberg v3: Standardizing Semi-Structured AI JSON Payloads

Implementing Positional Deletes in Iceberg v3: Streamlining Merge-on-Read for Fast-Inbound Event Lakes

Designing Private, Air-Gapped Data Lakehouses: Scaling Iceberg in Highly Secure, On-Premises Clouds

Designing Idempotent Pipelines in the Agentic Lakehouse: Eliminating Double-Write Anomalies

Decoupled Catalogs vs. Managed Tables: Architectural Freedom in the Age of Table Format Convergence

The State of Streaming to Apache Iceberg in July 2026: Every Path, Its Latency, and What to Do When Seconds Are Not Fast Enough

The State of Apache Iceberg v4 in July 2026: What the Dev List Tells Us About the Format's Next Chapter

The Real-Time Lakehouse with Streaming and Iceberg

Event-Driven Table Compaction with Agents

What Is LTAP in the Lakehouse?

PyIceberg at Scale Without Apache Spark

Server-Side Commit Deconflicting in REST Catalogs

Snowflake Interoperable Lakehouse Lessons

Goal-Directed Analytics Agents on Apache Iceberg

Modern Python Tooling for Apache Iceberg

Decoupling Storage and Compute in Apache Iceberg: A Deep Dive into Cost Optimization

Select Language

We use cookies