Bringing MLflow and Data Pipelines Closer Together
Explores integrating MLflow 3 with data pipelines for unified observability, covering data lineage, drift detection, and CI/CD for ML.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
501 articles from this blog
Explores integrating MLflow 3 with data pipelines for unified observability, covering data lineage, drift detection, and CI/CD for ML.
Explores data clean rooms for privacy-preserving analytics, covering core guarantees, platforms like Databricks and AWS, and real-world use cases.
Explores using Lance and Iceberg formats for multimodal AI data, addressing scan-heavy analytics vs. random-access retrieval for ML training.
A guide on automating Iceberg table maintenance to prevent small file accumulation, covering compaction, vacuuming, and modern tools.
Explores building modular query engines using Rust runtimes like Apache DataFusion, focusing on composability over monolithic designs.
Comparison of Iceberg catalog control planes: Polaris, Unity Catalog, and Cloud REST for lakehouse architecture.
Explains how OpenLineage provides a standardized API for data lineage, enabling faster incident investigation and data observability across the stack.
A technical guide on building real-time lakehouse architectures using Apache Flink 2.1 and the Dynamic Iceberg Sink, addressing schema drift, file proliferation, and operational rigidity.
Explores modern single-node data engineering tools like DuckDB, DataFusion, Polars, and LakeSail built on Apache Arrow for high-performance analytics.
Explores policy-as-code for lakehouse governance using ABAC, OPA, and cloud-native tools to replace RBAC with scalable, query-time data access controls.
Overview of Apache Iceberg 1.11.0 release, covering new features like metadata encryption, pluggable file formats, and query optimizations.
Explains how Apache Iceberg table writes work, including commit steps and ACID guarantees on object storage.
Explains why table formats like Apache Iceberg and Delta Lake are essential for reliable data lakes, solving atomic commits, schema evolution, and time travel.
A technical deep dive comparing metadata structures of modern table formats like Apache Iceberg, Delta Lake, and Hudi for data lakes.
Explains how Apache Iceberg's hidden partitioning prevents accidental full table scans by automatically mapping source column filters to partition values.
Explains lakehouse catalogs in Apache Iceberg, their role in metadata management, and how to choose between open source and managed options.
Explores embedding Iceberg catalogs directly into storage, covering AWS S3 Tables and MinIO AI Stor for simplified metadata management.
Explains how Apache Iceberg uses metadata for data skipping, enabling fast query performance by eliminating 90-99% of files before scanning.
Explains how Apache Iceberg enables partition evolution without rewriting data, solving a major data lake challenge.
Explains five ways Apache Iceberg table storage degrades over time, including small files, orphan files, and metadata bloat, with detection methods.