Data Modeling for Analytics: Optimize for Queries, Not Transactions
Explains why transactional data models are inefficient for analytics and how to design denormalized, query-optimized models for better performance.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
501 articles from this blog
Explains why transactional data models are inefficient for analytics and how to design denormalized, query-optimized models for better performance.
Seven common data modeling mistakes that cause reporting errors and slow analytics, with practical solutions to avoid them.
Explains what a semantic layer is, its components, and how it provides consistent business definitions for data queries and AI agents.
Explains how a semantic layer enforces data governance by embedding policies directly into the query path, ensuring consistent metrics and access control.
A comprehensive guide to data modeling, explaining its meaning, three abstraction levels, techniques, and importance for modern data systems.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
Explains the three levels of data modeling (conceptual, logical, physical) and their importance in database design.
Explains why AI data analytics fail without a semantic layer to define business metrics and ensure accurate, secure queries.
A comprehensive guide exploring the taxonomy, tools, and best practices for using AI-assisted coding tools in modern software development.
Explains Recursive Language Models (RLMs), which are LLMs that call themselves to break complex tasks into structured, reusable steps.
A 2025 year-in-review of key Apache data projects: Iceberg, Polaris, Parquet, and Arrow, detailing their major updates and future roadmap.
Introduces DremioFrame and IceFrame, two new Python libraries for simplifying work with Dremio and Apache Iceberg tables.
Introduces dremioframe, a Python DataFrame library for querying Dremio with a pandas-like API, generating SQL under the hood.
A hands-on tutorial exploring Dremio Cloud Next Gen's new free trial, covering its lakehouse platform, AI features, and SQL capabilities.
A comprehensive guide to learning Apache Iceberg, data lakehouse architecture, and Agentic AI with curated tutorials, tools, and resources.
Explores the commercial Apache Iceberg catalog ecosystem, focusing on REST Catalog standards, optimization strategies, and architectural trade-offs.
Explores two paths for building a universal lakehouse catalog that extends beyond Apache Iceberg tables to manage diverse data formats and sources.
A technical guide on using Apache Iceberg with Apache Spark and Polaris for building and managing a data lakehouse, covering setup, operations, and optimization.
Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.
A comprehensive guide comparing five major open table formats (Iceberg, Delta Lake, Hudi, Paimon, DuckLake) for modern data lakehouses, covering their internals and use cases.