Introduction to Data Engineering Concepts | Cloud Data Platforms and the Modern Stack
Explores the modern data stack, cloud platforms, and principles for building flexible, cloud-native data engineering architectures.
Explores the modern data stack, cloud platforms, and principles for building flexible, cloud-native data engineering architectures.
Explains the data lakehouse architecture, a unified approach combining data lake scalability with warehouse management features like ACID transactions.
A comprehensive 2025 guide to Apache Iceberg, covering its architecture, ecosystem, and practical use for data lakehouse management.
Argues that RAG system failures stem from data engineering issues like fragmented data and governance, not from model or vector database choices.
Overview of Overture Maps Foundation's updated global, open geospatial datasets, their partners, and data refresh strategy.
A profile of a Senior Analytics Engineer specializing in dbt, data mesh architecture, and applying library science principles to modern data teams.
An introduction to Apache Parquet, a columnar storage file format for efficient data processing and analytics.
Explains the hierarchical structure of Parquet files, detailing how pages, row groups, and columns optimize storage and query performance.
Explains how Parquet handles schema evolution, including adding/removing columns and changing data types, for data engineers.
A practical guide to reading and writing Parquet files in Python using PyArrow and FastParquet libraries.
Explores using GitHub Actions for software development CI/CD and advanced data engineering tasks like ETL pipelines and data orchestration.
A former Debezium lead argues that Change Data Capture (CDC) is a feature within larger data platforms, not a standalone product.
A comprehensive directory of Apache Iceberg resources, including tutorials, guides, and educational materials for data engineers and developers.
A list of upcoming tech talks and events by Alex Merced, focusing on Apache Iceberg, data lakehouses, and data engineering topics.
A video course covering the fundamentals of lakehouse engineering using Apache Iceberg, Nessie, and Dremio for data management.
Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.
A data engineer reflects on their 2-year career journey at the City of Boston, sharing lessons learned in data warehousing, ETL, and civic tech.
Explores the evolution of Apache Iceberg catalogs, focusing on the current REST Catalog and future proposals for server-side optimizations.
An introduction to Apache Iceberg, a table format for data lakehouses, explaining its architecture and providing learning resources.
A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.