Data Lakehouse Roundup 1 - News and Insights on the Lakehouse
Quarterly roundup of data lakehouse trends, table formats, and major industry news from Apache Iceberg to Delta Lake.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
333 articles from this blog
Quarterly roundup of data lakehouse trends, table formats, and major industry news from Apache Iceberg to Delta Lake.
A tutorial on using PyArrow for data analytics in Python, covering core concepts, file I/O, and analytical operations.
A comprehensive guide to using Rust's built-in collection types, including vectors, arrays, hashmaps, and sets, with performance tips and examples.
A guide to performing data operations using PySpark, Pandas, DuckDB, Polars, and DataFusion within a pre-configured Docker environment.
Explains how to implement access control and security for Apache Iceberg tables at the file, engine, and catalog levels.
A comprehensive directory of Apache Iceberg resources, including tutorials, guides, and educational materials for data engineers and developers.
Explores how combining data lakehouse, virtualization, and mesh architectures with Dremio solves modern data scaling and silo challenges.
A comprehensive guide to building interactive data applications using the Streamlit framework, covering setup, visualization, ML integration, and deployment.
A comprehensive guide to Docker Compose, covering file structure, service configuration, networking, volumes, and best practices for multi-container applications.
A comprehensive guide to string handling in Rust, covering types, conversions, operations, and performance best practices.
An introductory guide to Rust, covering its key features like memory safety, ownership, and setup for developers new to the language.
A hands-on tutorial for building a Data Lakehouse on your laptop using Apache Iceberg, Spark, Nessie, Minio, and Dremio.
Explains why data professionals should adopt Dremio and Apache Iceberg for flexible, high-performance data lakehouse architecture.
Explores five key trends shaping the data lakehouse architecture, including storage evolution, table formats, and catalog competition.
A guide on using the alexmerced/datanotebook Docker image for a quick data notebook environment with pre-installed libraries like pandas, Polars, and PySpark.
Explains how Apache Iceberg uses delete files for efficient row-level data deletions without rewriting entire datasets.
Explains the role and structure of Apache Iceberg manifest files, key metadata components for tracking data files and optimizing queries in data lakehouses.
Explains the role and structure of the Apache Iceberg Manifest List file in managing table snapshots and optimizing data lakehouse queries.
Explains the critical role and structure of the metadata.json file in Apache Iceberg, the open-source table format for data lakehouses.
Explains the purpose and limitations of the Apache Iceberg REST Catalog specification for standardizing table operations.