Data Engineering articles

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Modeling Basics

An introduction to data modeling concepts, covering OLTP vs OLAP systems, normalization, and common schema designs for data engineering.

Data Engineering Data Modeling Database Design Olap Oltp

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Streaming Data Fundamentals

Explains streaming data fundamentals, how streaming systems work, their use cases, and challenges compared to batch processing.

Batch Processing Data Engineering Data Pipelines Real Time Processing Streaming Data

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Batch Processing Fundamentals

Explains batch processing fundamentals for data engineering, covering concepts, tools, and its ongoing relevance in data workflows.

Apache Iceberg Batch Processing Data Engineering Data Pipelines Data Workflows

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.

Data Engineering Data Pipelines data transformation Elt Etl

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Understanding Data Sources and Ingestion

An introduction to data engineering concepts, focusing on data sources and ingestion strategies like batch vs. streaming.

Batch Processing Data Engineering Data Ingestion Data Sources streaming

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | What is Data Engineering?

An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.

Apache Iceberg Data Engineering Data Infrastructure Data Pipelines Data Warehouse

Alex Merced

4/22/2025 • EN

Interesting links - April 2025

A monthly roundup of curated links and articles on data engineering, Kafka, CDC, stream processing, and AI/ML topics.

Apache Flink change data capture Data Engineering Kafka Stream Processing

Robin Moffatt

3/20/2025 • EN

Building a data pipeline with DuckDB

A guide to building a data pipeline using DuckDB, covering data ingestion, transformation, and analytics with real-world environmental data.

Data Engineering Data Pipeline Duckdb Etl Slowly Changing Dimensions

Robin Moffatt

2/3/2025 • EN

Interesting links - February 2025

A monthly roundup of interesting links and articles about data engineering, databases, streaming tech, and data infrastructure.

Apache Kafka Data Architecture Data Engineering Databases streaming

Robin Moffatt

1/20/2025 • EN

2025 Comprehensive Guide to Apache Iceberg

A comprehensive 2025 guide to Apache Iceberg, covering its architecture, ecosystem, and practical use for data lakehouse management.

Apache Iceberg Big Data Data Engineering Data Lakehouse Table Format

Alex Merced

1/6/2025 • EN

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

Argues that RAG system failures stem from data engineering issues like fragmented data and governance, not from model or vector database choices.

Data Engineering Hybrid Search latency Rag Vector Databases

Alex Merced

12/19/2024 • EN

Overture Maps' Refreshed Global Geospatial Datasets

Overview of Overture Maps Foundation's updated global, open geospatial datasets, their partners, and data refresh strategy.

cloud storage Data Engineering Geospatial Data Open Data 깃

Mark Litwintschik

12/19/2024 • EN

Checkpoint Chronicle - December 2024

Monthly roundup of news and resources in data streaming, stream processing, and the Apache Kafka ecosystem, curated by industry experts.

Apache Flink Apache Kafka Data Engineering Event Streaming Stream Processing

Robin Moffatt

12/11/2024 • EN

Exploring Flink CDC

An overview of Apache Flink CDC, its declarative pipeline feature, and how it simplifies data integration from databases like MySQL to sinks like Elasticsearch.

Apache Flink change data capture Data Engineering Flink Cdc sql

Robin Moffatt

11/4/2024 • EN

dbt Community Spotlight

A profile of a Senior Analytics Engineer specializing in dbt, data mesh architecture, and applying library science principles to modern data teams.

Analytics Engineering Data Engineering Data Governance Data Mesh Dbt

Jenna Jordan

10/30/2024 • EN

Checkpoint Chronicle - October 2024

Monthly roundup of news and developments in data streaming, stream processing, and the data ecosystem, featuring Apache Flink, Kafka, and open-source tools.

Apache Flink Data Engineering Event Streaming Stream Processing Streaming SQL

Robin Moffatt