Etl articles

3/3/2026 • EN

In defence of correctness

An article arguing for the importance of correctness in software, especially in critical systems like reporting and ETL, and discussing when it is essential.

Business Impact Correctness data integrity Etl software development

Mark Seemann

3/1/2026 • EN

Connect Microsoft SQL Server to Dremio Cloud: Federate Enterprise Data Without ETL

Guide on connecting Microsoft SQL Server to Dremio Cloud for federated analytics, avoiding ETL and reducing license costs.

analytics Data Federation Dremio Etl SQL Server

Alex Merced

3/1/2026 • EN

Connect MySQL to Dremio Cloud: Federated Analytics Without ETL

Guide on connecting MySQL to Dremio Cloud for federated analytics, eliminating ETL pipelines and improving query performance.

analytics Data Federation Dremio Etl mysql

Alex Merced

2/19/2026 • EN

Idempotent Pipelines: Build Once, Run Safely Forever

Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.

Data Engineering Data Pipelines Data Quality Etl Idempotency

Alex Merced

2/19/2026 • EN

Data Virtualization and the Semantic Layer: Query Without Copying

Explains how data virtualization and a semantic layer enable querying distributed data without copying, reducing costs and improving freshness.

Analytics Architecture Data Pipeline Data Virtualization Etl Semantic Layer

Alex Merced

2/19/2026 • EN

Data Vault Modeling: Hubs, Links, and Satellites

Explains Data Vault data modeling, its core components (Hubs, Links, Satellites), and the problems it solves for complex, evolving data sources.

Data Vault Modeling Data Warehousing Dimensional Modeling Etl Hubs Links Satellites

Alex Merced

2/19/2026 • EN

Slowly Changing Dimensions: Types 1-3 with Examples

Explains Slowly Changing Dimensions (SCD) types 1-3 for managing data history in data warehouses, with practical examples.

Data Warehousing Database Design Dimensional Modeling Etl Slowly Changing Dimensions

Alex Merced

2/19/2026 • EN

Data Engineering Best Practices: The Complete Checklist

A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.

best practices Data Engineering Data Quality Etl Pipeline Design

Alex Merced

9/26/2025 • EN

Analysis-Ready OpenStreetMap

Exploring the Layercake project's analysis-ready OpenStreetMap data in Parquet format, including setup and performance on a high-end workstation.

Etl Geospatial Openstreetmap Parquet Python

Mark Litwintschik

9/6/2025 • EN

The World's 2.75B Buildings

Analysis of a new global building dataset (2.75B structures), detailing the data processing, technical setup, and tools used for exploration.

aws s3 Etl Geospatial Data Parquet Qgi

Mark Litwintschik

5/2/2025 • EN

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.

Data Engineering Data Pipelines data transformation Elt Etl

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

Explores workflow orchestration in data engineering, covering DAGs, tools, and best practices for managing complex data pipelines.

Data Engineering Directed Acyclic Graphs Etl Scheduling Workflow Orchestration

Alex Merced

3/20/2025 • EN

Building a data pipeline with DuckDB

A guide to building a data pipeline using DuckDB, covering data ingestion, transformation, and analytics with real-world environmental data.

Data Engineering Data Pipeline Duckdb Etl Slowly Changing Dimensions

Robin Moffatt

4/8/2024 • EN

Reflecting on my tenure at the City of Boston

A data engineer reflects on their 2-year career journey at the City of Boston, sharing lessons learned in data warehousing, ETL, and civic tech.

analytics Civic Tech Data Engineering Etl Pipelines

Jenna Jordan

3/13/2024 • EN

A Taxonomy Of Data Change Events

Explores a taxonomy of data change events in CDC, detailing Full, Delta, and Id-only events and their use cases.

change data capture Data Events Debezium Etl Real Time Processing

Gunnar Morling

3/13/2024 • EN

A Taxonomy Of Data Change Events

Explores three types of data change events in Change Data Capture (CDC): Full, Delta, and Id-only events, detailing their structure and use cases.

change data capture Data Events database Etl real-time

Gunnar Morling

2/2/2024 • EN

Introduction to Data Vault Modeling

An introduction to Data Vault modeling, a flexible data warehouse design method using Hubs, Links, and Satellites for scalable data integration.

Data Governance Data Integration Data Vault Modeling Data Warehousing Etl

Alex Merced

9/10/2023 • EN

TWIL: September 10, 2023

A weekly tech learning digest covering Microsoft Fabric, AI topics, computer vision, Azure AI Document Intelligence, embeddings, and vector search.

Azure AI computer vision Data Engineering Etl Microsoft Fabric

André Vala

1/10/2023 • EN

Faster PostgreSQL To BigQuery Transfers

A technical guide on using ClickHouse to export PostgreSQL data to Parquet format for faster loading into Google BigQuery.

Bigquery Data Migration Etl Geospatial postgresql

Mark Litwintschik

10/2/2022 • EN

Data Engineering in 2022: Architectures & Terminology

Explains the evolution from ETL to ELT in data engineering, clarifying the role of modern tools like dbt in the transformation process.

Data Engineering Data Warehouse Dbt Elt Etl

Robin Moffatt

Etl Articles

In defence of correctness

Connect Microsoft SQL Server to Dremio Cloud: Federate Enterprise Data Without ETL

Connect MySQL to Dremio Cloud: Federated Analytics Without ETL

Idempotent Pipelines: Build Once, Run Safely Forever

Data Virtualization and the Semantic Layer: Query Without Copying

Data Vault Modeling: Hubs, Links, and Satellites

Slowly Changing Dimensions: Types 1-3 with Examples

Data Engineering Best Practices: The Complete Checklist

Analysis-Ready OpenStreetMap

The World's 2.75B Buildings

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

Building a data pipeline with DuckDB

Reflecting on my tenure at the City of Boston

A Taxonomy Of Data Change Events

A Taxonomy Of Data Change Events

Introduction to Data Vault Modeling

TWIL: September 10, 2023

Faster PostgreSQL To BigQuery Transfers

Data Engineering in 2022: Architectures & Terminology

Select Language

We use cookies