What Iceberg V3 Advances Mean for CDC Pipelines
Explains how Apache Iceberg V3 improves CDC pipelines with deletion vectors and row lineage, solving delete file accumulation.
Explains how Apache Iceberg V3 improves CDC pipelines with deletion vectors and row lineage, solving delete file accumulation.
A practical walkthrough of working with Apache Iceberg on Dremio Cloud, covering table creation, data ingestion, optimization, and AI-powered analytics.
Explains the modular Apache Lakehouse architecture using open-source components like Parquet, Iceberg, Polaris, and Arrow for vendor-neutral data management.
A monthly roundup of interesting links focused on Kafka, event streaming, stream processing, and analytics in the tech industry.
Testing Claude Code's ability to build a production-ready dbt project for a data pipeline, evaluating prompts and skills.
Explores the current capabilities and limitations of using Claude Code (AI) to build a dbt project, arguing it won't replace data engineers yet.
A technical demonstration of using Claude Code AI to autonomously debug and adapt dbt data models by analyzing data anomalies.
A guide to integrating Dremio's data platform with JetBrains AI Assistant for enhanced data querying, pipeline generation, and app development within JetBrains IDEs.
Monthly job board for database professionals, featuring remote and onsite data engineering, DBA, and analytics roles from March 2026.
A monthly roundup of tech links focusing on data engineering, Kafka, AI, and software development, including personal articles and industry news.
Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.
A guide to the core principles and systems thinking required for data engineering, beyond just learning specific tools.
A guide to designing reliable, fault-tolerant data pipelines with architectural principles like idempotency, observability, and DAG-based workflows.
Argues that data quality must be enforced at the pipeline's ingestion point, not patched in dashboards, to ensure consistent, reliable data.
Explains how to safely evolve data schemas using API-like discipline to prevent breaking downstream systems like dashboards and ML pipelines.
A guide to choosing between batch and streaming data processing models based on actual freshness requirements and cost.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
Seven critical mistakes that can derail semantic layer projects in data engineering, with practical advice on how to avoid them.
Seven common data modeling mistakes that cause reporting errors and slow analytics, with practical solutions to avoid them.
Explains the importance of automated testing for data pipelines, covering schema validation, data quality checks, and regression testing.