What Is Data Modeling? A Complete Guide
A comprehensive guide to data modeling, explaining its meaning, three abstraction levels, techniques, and importance for modern data systems.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
418 articles from this blog
A comprehensive guide to data modeling, explaining its meaning, three abstraction levels, techniques, and importance for modern data systems.
Explains the three levels of data modeling (conceptual, logical, physical) and their importance in database design.
Explains the importance of automated testing for data pipelines, covering schema validation, data quality checks, and regression testing.
A guide to the core principles and systems thinking required for data engineering, beyond just learning specific tools.
Explains the distinct roles of data catalogs and semantic layers in data architecture, arguing they are complementary tools.
Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.
A guide to designing reliable, fault-tolerant data pipelines with architectural principles like idempotency, observability, and DAG-based workflows.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
Seven critical mistakes that can derail semantic layer projects in data engineering, with practical advice on how to avoid them.
A guide to choosing between batch and streaming data processing models based on actual freshness requirements and cost.
Explains data partitioning and organization strategies to drastically improve query performance in analytical databases.
Explains Headless BI and how a universal semantic layer centralizes metric definitions to replace tool-specific models, enabling consistent analytics.
Explains how a self-documenting semantic layer uses AI to automate data documentation, reducing manual work and governance risks for data teams.
Compares Star Schema and Snowflake Schema data models, explaining their structures, trade-offs, and when to use each for optimal data warehousing.
Explains the importance of pipeline observability for data health, covering metrics, logs, and lineage to detect issues beyond simple execution monitoring.
Explains how a semantic layer enforces data governance by embedding policies directly into the query path, ensuring consistent metrics and access control.
Explains dimensional modeling for analytics, covering facts, dimensions, grains, and table design for query performance.
Explains Slowly Changing Dimensions (SCD) types 1-3 for managing data history in data warehouses, with practical examples.
Explains why AI data analytics fail without a semantic layer to define business metrics and ensure accurate, secure queries.
Explains the difference between a metrics layer and a semantic layer in data architecture, clarifying their distinct roles and relationship.