Data Modeling for the Lakehouse: What Changes
Explores how data modeling principles adapt for modern lakehouse architectures using open formats like Apache Iceberg and the Medallion pattern.
Explores how data modeling principles adapt for modern lakehouse architectures using open formats like Apache Iceberg and the Medallion pattern.
Explains Slowly Changing Dimensions (SCD) types 1-3 for managing data history in data warehouses, with practical examples.
Explains the importance of pipeline observability for data health, covering metrics, logs, and lineage to detect issues beyond simple execution monitoring.
Explains data partitioning and organization strategies to drastically improve query performance in analytical databases.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
A step-by-step guide to building a robust semantic layer for consistent data metrics, covering architecture, stakeholder alignment, and implementation.
Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.
Compares Star Schema and Snowflake Schema data models, explaining their structures, trade-offs, and when to use each for optimal data warehousing.
Explains why transactional data models are inefficient for analytics and how to design denormalized, query-optimized models for better performance.
Seven critical mistakes that can derail semantic layer projects in data engineering, with practical advice on how to avoid them.
Explains the importance of automated testing for data pipelines, covering schema validation, data quality checks, and regression testing.
A comprehensive guide to data modeling, explaining its meaning, three abstraction levels, techniques, and importance for modern data systems.
Explains the distinct roles of data catalogs and semantic layers in data architecture, arguing they are complementary tools.
Explains how data virtualization and a semantic layer enable querying distributed data without copying, reducing costs and improving freshness.
Explains the three levels of data modeling (conceptual, logical, physical) and their importance in database design.
A guide to designing reliable, fault-tolerant data pipelines with architectural principles like idempotency, observability, and DAG-based workflows.
Explains how a semantic layer enforces data governance by embedding policies directly into the query path, ensuring consistent metrics and access control.
Explains what a semantic layer is, its components, and how it provides consistent business definitions for data queries and AI agents.
Seven common data modeling mistakes that cause reporting errors and slow analytics, with practical solutions to avoid them.
A guide to the core principles and systems thinking required for data engineering, beyond just learning specific tools.