Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
Explores whether Debezium can lose database change events, explaining its at-least-once semantics and operational pitfalls like log retention.
Explores whether the Debezium change data capture tool can lose database events, discussing its at-least-once semantics and operational pitfalls.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.
A weekly tech learning digest covering Microsoft Fabric, AI topics, computer vision, Azure AI Document Intelligence, embeddings, and vector search.
An introduction to analytical data warehouses, explaining their purpose, differences from transactional databases, and their role in team-based analytics.
Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.
A software engineer explains their decision to join Decodable, a startup building a serverless real-time data platform, focusing on stream processing.
An introduction to modern data systems, explaining OLTP, OLAP, data warehouses, data lakes, and the roles of data engineers, analysts, and scientists.
A guide explaining key data engineering terms like data warehouses, data lakes, data mesh, and data pipelines, with definitions and comparisons.
Guide on configuring an external Apache Hive metastore with Azure SQL for use in an Azure Synapse Analytics Spark Pool, troubleshooting common connection errors.
A recap of 2021 conference talks on Debezium and Change Data Capture (CDC), exploring patterns and integrations with tools like Kafka and Infinispan.
A recap of 2021 conference talks on Debezium and Change Data Capture (CDC), exploring patterns and integrations with tools like Kafka and Pinot.
Introducing Data Fluent, an open-source Python package for analyzing and understanding PostgreSQL database structure, row counts, and growth trends.
Announcing the free release of 'Practical MongoDB Aggregations', a book with tips and examples for developers and data professionals.
Explores the concept of feature stores in machine learning, presenting a hierarchy of needs from basic access to full automation.
An analysis of data discovery platforms, their key features, and available open-source solutions to improve data findability in organizations.
Argues that data scientists should own the entire process from problem identification to solution deployment for greater impact and efficiency.