Visibility is Velocity
A technical article on how visibility and communication, not just speed, are critical for engineering team success and stakeholder trust.
A technical article on how visibility and communication, not just speed, are critical for engineering team success and stakeholder trust.
A developer's journey to understanding AI agents and the Model Context Protocol (MCP), moving beyond traditional data pipeline thinking.
Explores how DevOps principles like CI/CD, infrastructure as code, and monitoring are applied to data engineering for reliable, scalable data pipelines.
Explores the importance of data quality and validation in data engineering, covering key dimensions and tools for reliable pipelines.
Explains streaming data fundamentals, how streaming systems work, their use cases, and challenges compared to batch processing.
Explains batch processing fundamentals for data engineering, covering concepts, tools, and its ongoing relevance in data workflows.
An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.
Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.
A tutorial on setting up and running PyFlink streaming data jobs on a Kubernetes cluster, including installation and deployment steps.
A tutorial on setting up and running PyFlink streaming data jobs on a Kubernetes cluster, including prerequisites and deployment steps.
Explores how Azure services like Data Factory, Databricks, and Machine Learning enable DataOps for streamlined, automated data pipelines.
Explores essential design patterns for building efficient and maintainable machine learning systems in production, focusing on data pipelines and best practices.
An overview of Apache Kafka, explaining its core concepts as a distributed event streaming platform for real-time data pipelines.
A software engineer explains their decision to join Decodable, a startup building a serverless real-time data platform, focusing on stream processing.
Explores why data and ML pipeline tests break incorrectly and offers strategies for writing more robust unit, schema, and integration tests.
An analysis of key trends in the Apache Kafka ecosystem, including connector growth, self-service data pipelines, and stream processing adoption.
An analysis of current trends in the Apache Kafka ecosystem, focusing on connector growth, self-service data pipelines, and stream processing adoption.
Explains why Apache Airflow jobs appear to run a day late due to its scheduling logic, contrasting it with cron jobs.
A tutorial on building data pipelines using Microsoft Azure Data Factory, covering ingestion, transformation, and orchestration.