Dremio's Built-in Open Catalog: Your Zero-Configuration Apache Iceberg Lakehouse
Introduces Dremio's built-in Open Catalog for Apache Iceberg, offering a zero-configuration, production-ready lakehouse solution with automated management.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
425 articles from this blog
Introduces Dremio's built-in Open Catalog for Apache Iceberg, offering a zero-configuration, production-ready lakehouse solution with automated management.
Tutorial on using Dremio's AI_GENERATE SQL function to extract structured data from unstructured text like emails and contracts.
Explains the difference between a metrics layer and a semantic layer in data architecture, clarifying their distinct roles and relationship.
Explains why AI data analytics fail without a semantic layer to define business metrics and ensure accurate, secure queries.
Argues that data quality must be enforced at the pipeline's ingestion point, not patched in dashboards, to ensure consistent, reliable data.
Seven critical mistakes that can derail semantic layer projects in data engineering, with practical advice on how to avoid them.
Explains the importance of pipeline observability for data health, covering metrics, logs, and lineage to detect issues beyond simple execution monitoring.
A guide to designing reliable, fault-tolerant data pipelines with architectural principles like idempotency, observability, and DAG-based workflows.
Explains how a self-documenting semantic layer uses AI to automate data documentation, reducing manual work and governance risks for data teams.
Explains how to safely evolve data schemas using API-like discipline to prevent breaking downstream systems like dashboards and ML pipelines.
Explains the importance of automated testing for data pipelines, covering schema validation, data quality checks, and regression testing.
A guide to choosing between batch and streaming data processing models based on actual freshness requirements and cost.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
Explains the distinct roles of data catalogs and semantic layers in data architecture, arguing they are complementary tools.
Explains Headless BI and how a universal semantic layer centralizes metric definitions to replace tool-specific models, enabling consistent analytics.
Explains how a semantic layer enforces data governance by embedding policies directly into the query path, ensuring consistent metrics and access control.
Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.
Explains how data virtualization and a semantic layer enable querying distributed data without copying, reducing costs and improving freshness.
Explains data partitioning and organization strategies to drastically improve query performance in analytical databases.
A guide to the core principles and systems thinking required for data engineering, beyond just learning specific tools.