Can Debezium Lose Events?
Explores whether the Debezium change data capture tool can lose database events, discussing its at-least-once semantics and operational pitfalls.
Explores whether the Debezium change data capture tool can lose database events, discussing its at-least-once semantics and operational pitfalls.
Monthly roundup of data streaming trends, featuring Apache Iceberg, Kafka Streams, Flink deployments, and streaming SQL insights.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.
An introductory overview of Apache Flink, explaining its core concepts as a distributed stream processing framework, its history, and primary use cases.
Author explains their move to Decodable to dive deeper into stream processing, Apache Flink, and work with experts in the field.
A weekly tech learning digest covering Microsoft Fabric, AI topics, computer vision, Azure AI Document Intelligence, embeddings, and vector search.
An introduction to analytical data warehouses, explaining their purpose, differences from transactional databases, and their role in team-based analytics.
Analysis of Hacker News job posts shows the Data Scientist role declining while ML Engineer roles rise, indicating a shift in the data job market.
Interview with Chad Sanderson on data platform leadership, experimentation culture, data quality, and the rise of data contracts.
Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.
How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.
An update on how Monzo integrated machine learning across its organization in 2022, covering team structure, growth, and new initiatives.
Explores the shift to ELT in data engineering, focusing on modern tools like dbt, Fivetran, and Airbyte for loading and transforming data.
A software engineer explains their decision to join Decodable, a startup building a serverless real-time data platform, focusing on stream processing.
A technical walkthrough of using dbt and DuckDB to clean and analyze session feedback data from a tech conference.
A hands-on exploration of using dbt (data build tool) with DuckDB for local data engineering, based on a tutorial project.
Explains the evolution from ETL to ELT in data engineering, clarifying the role of modern tools like dbt in the transformation process.
A hands-on tutorial exploring LakeFS for data versioning and branching using PySpark and Jupyter notebooks in a data engineering context.
A curated list of essential resources for data engineering, including articles, newsletters, podcasts, and tools.