Introduction to Data Engineering Concepts | DevOps for Data Engineering
Explores how DevOps principles like CI/CD, infrastructure as code, and monitoring are applied to data engineering for reliable, scalable data pipelines.
Explores how DevOps principles like CI/CD, infrastructure as code, and monitoring are applied to data engineering for reliable, scalable data pipelines.
Explores workflow orchestration in data engineering, covering DAGs, tools, and best practices for managing complex data pipelines.
Explores core principles of scalable data engineering, including parallelism, minimizing data movement, and designing adaptable pipelines for growing data volumes.
Explores the modern data stack, cloud platforms, and principles for building flexible, cloud-native data engineering architectures.
Explains the data lakehouse architecture, a unified approach combining data lake scalability with warehouse management features like ACID transactions.
A monthly roundup of curated links and articles on data engineering, Kafka, CDC, stream processing, and AI/ML topics.
A guide to building a data pipeline using DuckDB, covering data ingestion, transformation, and analytics with real-world environmental data.
A monthly roundup of interesting links and articles about data engineering, databases, streaming tech, and data infrastructure.
A comprehensive 2025 guide to Apache Iceberg, covering its architecture, ecosystem, and practical use for data lakehouse management.
Argues that RAG system failures stem from data engineering issues like fragmented data and governance, not from model or vector database choices.
Overview of Overture Maps Foundation's updated global, open geospatial datasets, their partners, and data refresh strategy.
Monthly roundup of news and resources in data streaming, stream processing, and the Apache Kafka ecosystem, curated by industry experts.
An overview of Apache Flink CDC, its declarative pipeline feature, and how it simplifies data integration from databases like MySQL to sinks like Elasticsearch.
A profile of a Senior Analytics Engineer specializing in dbt, data mesh architecture, and applying library science principles to modern data teams.
Monthly roundup of news and developments in data streaming, stream processing, and the data ecosystem, featuring Apache Flink, Kafka, and open-source tools.
A practical guide to reading and writing Parquet files in Python using PyArrow and FastParquet libraries.
Explains how Parquet handles schema evolution, including adding/removing columns and changing data types, for data engineers.
Explains the hierarchical structure of Parquet files, detailing how pages, row groups, and columns optimize storage and query performance.
An introduction to Apache Parquet, a columnar storage file format for efficient data processing and analytics.
Explores using GitHub Actions for software development CI/CD and advanced data engineering tasks like ETL pipelines and data orchestration.