Data Lake articles

3/1/2026 • EN

Connect AWS Glue Data Catalog to Dremio Cloud: Query and Manage Your AWS Iceberg Tables

Guide on connecting AWS Glue Data Catalog to Dremio Cloud for querying and managing AWS Iceberg tables with full DML support and federation.

Apache Iceberg AWS Glue Data Catalog Data Lake Dremio Cloud

Alex Merced

3/1/2026 • EN

Connect Amazon S3 to Dremio Cloud: Query Your Data Lake with SQL, Federation, and AI

Guide on connecting Amazon S3 to Dremio Cloud to query data lakes with SQL, federation, and AI-powered analytics.

Amazon S3 Data Federation Data Lake Dremio Cloud SQL Querying

Alex Merced

10/21/2024 • EN

All About Parquet Part 09 - Parquet in Data Lake Architectures

Explores why Parquet is the ideal columnar file format for optimizing storage and query performance in modern data lake and lakehouse architectures.

Apache Iceberg Big Data Columnar Storage Data Lake Parquet

Alex Merced

10/21/2024 • EN

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

Final guide in a series covering performance tuning and best practices for optimizing Apache Parquet files in big data workflows.

Big Data Data Compression Data Lake Parquet performance tuning

Alex Merced

8/15/2024 • EN

ACID Guarantees and Apache Iceberg - Turning Any Storage into a Data Warehouse

Explains how Apache Iceberg brings ACID transaction guarantees to data lakes, enabling reliable data operations on open storage.

Acid Apache Iceberg Data Lake Data Warehouse Transactions

Alex Merced

7/12/2024 • EN

Databases Deconstructed - The Value of Data Lakehouses and Table Formats

Explains the data lakehouse architecture, its layers (storage, table format, catalog, processing), and its advantages over traditional data warehouses.

analytics Data Lake Data Lakehouse Data Warehousing Table Format

Alex Merced

5/29/2024 • EN

Partitioning with Apache Iceberg - A Deep Dive

Explores Apache Iceberg's advanced partitioning features, including hidden partitioning and transformations, for optimizing query performance in data lakes.

Apache Iceberg Data Lake Hidden Partitioning Partitioning Query Optimization

Alex Merced

5/15/2024 • EN

3 Reasons Data Engineers Should Embrace Apache Iceberg

Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.

Apache Iceberg Data Engineering Data Lake Partitioning Table Format

Alex Merced

7/10/2023 • EN

Project Nessie: A Look in the Depths

Project Nessie is a version control system for data lakes, bringing Git-like operations to manage and track changes in data assets.

Data Lake Data Version Control GIT Like Operations Metadata Management Root Pointer Store

Alex Merced

4/5/2023 • EN

Overview of the Data Lakehouse, Dremio and Apache Iceberg

Explains the data lakehouse concept, Dremio's role as a platform, and Apache Iceberg's function as a table format for modern data architectures.

Apache Iceberg Data Architecture Data Lake Data Lakehouse Dremio

Alex Merced

11/22/2022 • EN

Understanding Spark Configurations with Apache Iceberg

A guide to configuring Apache Spark for use with the Apache Iceberg table format, covering packages, flags, and programmatic setup.

Apache Iceberg Apache Spark Big Data Data Lake Spark Configurations

Alex Merced

9/14/2022 • EN

Data Engineering: Resources

A curated list of essential resources for data engineering, including articles, newsletters, podcasts, and tools.

Data Engineering Data Lake Data Warehouse Dbt Modern Data Stack

Robin Moffatt

6/2/2022 • EN

Data what??

A guide explaining key data engineering terms like data warehouses, data lakes, data mesh, and data pipelines, with definitions and comparisons.

Data Engineering Data Fabric Data Lake Data Mesh Data Warehouse

Rob Koch

4/27/2016 • EN

Building The Azure IoT Analytics Architecture Prototype

A guide to building a prototype IoT analytics architecture on Azure, covering IoT Hub, Stream Analytics, and Power BI integration.

Azure IOT Data Lake IOT Analytics Power Bi Stream Analytics

Rahul Rai

Data Lake Articles

Connect AWS Glue Data Catalog to Dremio Cloud: Query and Manage Your AWS Iceberg Tables

Connect Amazon S3 to Dremio Cloud: Query Your Data Lake with SQL, Federation, and AI

All About Parquet Part 09 - Parquet in Data Lake Architectures

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

ACID Guarantees and Apache Iceberg - Turning Any Storage into a Data Warehouse

Databases Deconstructed - The Value of Data Lakehouses and Table Formats

Partitioning with Apache Iceberg - A Deep Dive

3 Reasons Data Engineers Should Embrace Apache Iceberg

Project Nessie: A Look in the Depths

Overview of the Data Lakehouse, Dremio and Apache Iceberg

Understanding Spark Configurations with Apache Iceberg

Data Engineering: Resources

Data what??

Building The Azure IoT Analytics Architecture Prototype

Select Language