The State of Apache Iceberg v4 - October 2025 Edition
Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.
Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.
A monthly roundup of 78 curated links on data engineering, architecture, AI, and tech trends, with top picks highlighted.
A monthly roundup of curated links and articles focused on data engineering, Apache Kafka, and data platform technologies.
A guide to scheduling compaction and snapshot expiration in Apache Iceberg tables based on workload patterns and infrastructure constraints.
A monthly roundup of data engineering links covering Apache Iceberg, Kafka, Debezium, Spark, and lakehouse architecture.
Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.
Explains the importance of table maintenance in Apache Iceberg for data lakehouses, covering metadata and file management.
An analysis of DuckLake, a new open table format and catalog specification for data engineering, comparing it to existing solutions like Iceberg and Delta Lake.
A monthly roundup of curated links and articles covering data engineering, Kafka, stream processing, and AI, with top picks highlighted.
Explains the importance of data storage formats and compression for performance and cost in large-scale data engineering systems.
Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.
Explains streaming data fundamentals, how streaming systems work, their use cases, and challenges compared to batch processing.
Explains batch processing fundamentals for data engineering, covering concepts, tools, and its ongoing relevance in data workflows.
An introduction to data modeling concepts, covering OLTP vs OLAP systems, normalization, and common schema designs for data engineering.
An introduction to data engineering concepts, focusing on data sources and ingestion strategies like batch vs. streaming.
An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.
Explains data lakes, their key characteristics, and how they differ from data warehouses in modern data architecture.
Explains core data engineering concepts: metadata, data lineage, and governance, and their importance for scalable, compliant data systems.
Explores the importance of data quality and validation in data engineering, covering key dimensions and tools for reliable pipelines.
An introduction to data warehousing concepts, covering architecture, components, and performance optimization for analytical workloads.