Alex Merced

Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.

https://tuts.alexmercedcoder.dev

RSS Feed

12/31/2025

data engineering apache iceberg data lakehouse python analytics

Articles from this Blog

501 articles from this blog

10/21/2024 • EN

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

Explains encoding techniques in Parquet files, including dictionary, RLE, bit-packing, and delta encoding, to optimize storage and performance.

data encoding Parquet Dictionary Encoding

10/21/2024 • EN

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

Explores how metadata in Parquet files improves data efficiency and query performance, covering file, row group, and column-level metadata.

metadata Query Performance Parquet

10/21/2024 • EN

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

A practical guide to reading and writing Parquet files in Python using PyArrow and FastParquet libraries.

Python Data Engineering Parquet

10/21/2024 • EN

All About Parquet Part 09 - Parquet in Data Lake Architectures

Explores why Parquet is the ideal columnar file format for optimizing storage and query performance in modern data lake and lakehouse architectures.

Big Data Parquet Apache Iceberg

10/21/2024 • EN

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

Final guide in a series covering performance tuning and best practices for optimizing Apache Parquet files in big data workflows.

performance tuning Big Data Data Compression

10/19/2024 • EN

A Deep Dive Into GitHub Actions From Software Development to Data Engineering

Explores using GitHub Actions for software development CI/CD and advanced data engineering tasks like ETL pipelines and data orchestration.

DevOps ci/cd automation

10/19/2024 • EN

Orchestrating Airflow DAGs with GitHub Actions - A Lightweight Approach to Data Curation Across Spark, Dremio, and Snowflake

Using GitHub Actions to trigger Airflow DAGs for orchestrating data pipelines across Spark, Dremio, and Snowflake.

Github Actions Apache Spark Dremio

10/18/2024 • EN

A Guide to dbt Macros - Purpose, Benefits, and Usage

A guide explaining dbt macros, their purpose, benefits, and how to use them to write reusable, standardized SQL code in data transformation projects.

sql macros data transformation

10/16/2024 • EN

Data Lakehouse Roundup 1 - News and Insights on the Lakehouse

Quarterly roundup of data lakehouse trends, table formats, and major industry news from Apache Iceberg to Delta Lake.

Apache Iceberg Data Lakehouse Table Formats

10/15/2024 • EN

Getting Started with Data Analytics Using PyArrow in Python

A tutorial on using PyArrow for data analytics in Python, covering core concepts, file I/O, and analytical operations.

Python Data Analytics Apache Arrow

10/14/2024 • EN

Working with Collections in Rust | A Comprehensive Guide

A comprehensive guide to using Rust's built-in collection types, including vectors, arrays, hashmaps, and sets, with performance tips and examples.

rust Collections Iterators

10/7/2024 • EN

A Brief Guide to the Governance of Apache Iceberg Tables

Explains how to implement access control and security for Apache Iceberg tables at the file, engine, and catalog levels.

Access Control Governance Apache Iceberg

10/7/2024 • EN

Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook

A guide to performing data operations using PySpark, Pandas, DuckDB, Polars, and DataFusion within a pre-configured Docker environment.

Pyspark Pandas Duckdb

10/5/2024 • EN

Ultimate Directory of Apache Iceberg Resources

A comprehensive directory of Apache Iceberg resources, including tutorials, guides, and educational materials for data engineers and developers.

metadata Data Engineering Apache Iceberg

9/25/2024 • EN

Virtualization + Lakehouse + Mesh = Data At Scale

Explores how combining data lakehouse, virtualization, and mesh architectures with Dremio solves modern data scaling and silo challenges.

Data Architecture Apache Iceberg Data Lakehouse

9/22/2024 • EN

Deep Dive into Data Apps with Streamlit

A comprehensive guide to building interactive data applications using the Streamlit framework, covering setup, visualization, ML integration, and deployment.

Python Machine Learning data visualization

9/21/2024 • EN

A Deep Dive into Docker Compose

A comprehensive guide to Docker Compose, covering file structure, service configuration, networking, volumes, and best practices for multi-container applications.

DevOps configuration docker