Data Lakehouse articles

7/6/2026 • EN

Preparing Your Data Lakehouse for the EU AI Act: Auditable Lineage and Data Provenance

Technical guide on preparing data lakehouses for EU AI Act compliance, focusing on auditable lineage and data provenance.

Auditability Data Lakehouse Data Lineage Data Provenance Eu AI Act

Alex Merced

7/6/2026 • EN

Preparing Your Data Lakehouse for the EU AI Act: Auditable Lineage and Data Provenance Requirements

Explains how to prepare a data lakehouse for EU AI Act compliance, focusing on data lineage and provenance.

Auditability Data Lakehouse Data Lineage Data Provenance Eu AI Act

Alex Merced

7/6/2026 • EN

Multi-Engine Catalog Federation with Apache Polaris: Syncing Google Cloud, AWS, and Azure Metadata

Explores multi-engine catalog federation using Apache Polaris to sync metadata across Google Cloud, AWS, and Azure for open lakehouse governance.

Apache Iceberg Apache Polaris Catalog Federation Data Lakehouse Multicloud

Alex Merced

7/6/2026 • EN

Designing Private, Air-Gapped Data Lakehouses: Scaling Iceberg in Highly Secure, On-Premises Clouds

Designing secure, air-gapped data lakehouses using Apache Iceberg for defense, healthcare, finance, and other high-security sectors.

Air Gapped Lakehouse Apache Iceberg Data Lakehouse Private Cloud Secure Analytics

Alex Merced

6/8/2026 • EN

Apache Iceberg v4 Roadmap: Adaptive Metadata Trees, Single-File Commits, and the Delta Convergence

Overview of Apache Iceberg v4 roadmap proposals including adaptive metadata trees, single-file commits, and convergence with Delta Lake.

Apache Iceberg V4 Column Families Data Lakehouse Delta Lake Convergence Metadata Architecture

Alex Merced

6/8/2026 • EN

Modern Python Tooling for Apache Iceberg

Overview of modern Python tools for Apache Iceberg, including PyIceberg, IceFrame, and CLI for metadata management.

Apache Iceberg Data Engineering Data Lakehouse Pyiceberg Python

Alex Merced

5/28/2026 • EN

Designing an Immutable Data Lakehouse: Best Practices for Iceberg Snapshot Expiration

Best practices for managing Apache Iceberg snapshot expiration in data lakehouses to optimize query performance and metadata size.

Data Lakehouse Iceberg Metadata Management Query Performance Snapshot Expiration

Alex Merced

5/28/2026 • EN

Decoupling Storage and Compute in Apache Iceberg: A Deep Dive into Cost Optimization

Explores how Apache Iceberg decouples storage and compute for cost optimization, including multi-engine routing and TCO analysis.

Apache Iceberg cost optimization Data Lakehouse Multi Engine Query Storage Compute Decoupling

Alex Merced

5/28/2026 • EN

The 2026 Unified Data Architecture: Reconciling Multi-Cloud Data Lakehouses

Explains the 2026 unified data architecture for multi-cloud data lakehouses using open standards like Apache Iceberg.

Apache Iceberg Data Lakehouse Multi Cloud Query Federation Zero Etl

Alex Merced

5/28/2026 • EN

The Death of the Data Swamp: Establishing Governance in Your 2026 Data Lakehouse

A guide to preventing data swamps in lakehouses through active governance, metadata stewardship, schema evolution safety, and drift detection.

Data Drift Detection Data Governance Data Lakehouse Metadata Stewardship Schema Evolution

Alex Merced

5/24/2026 • EN

Choosing the Right Iceberg Control Plane: Polaris vs. Unity Catalog vs. Cloud REST

Comparison of Iceberg catalog control planes: Polaris, Unity Catalog, and Cloud REST for lakehouse architecture.

Apache Iceberg Catalog Control Plane Data Lakehouse Metadata Management Multi Engine Interoperability

Alex Merced

5/23/2026 • EN

An In-Depth Overview of the Apache Iceberg 1.11.0 Release

Overview of Apache Iceberg 1.11.0 release, covering new features like metadata encryption, pluggable file formats, and query optimizations.

Apache Iceberg Data Lakehouse Encryption File Format API metadata

Alex Merced

4/29/2026 • EN

Apache Iceberg Metadata Tables: Querying the Internals

Explains Apache Iceberg metadata tables for querying table internals using SQL, covering snapshots, files, manifests, partitions, and practical use cases.

Apache Iceberg Data Lakehouse Metadata Tables sql Table Formats

Alex Merced

4/13/2026 • EN

What is Apache Parquet? Columns, Encoding, and Performance

Explains Apache Parquet's columnar architecture, dictionary encoding, and performance benefits for data analytics.

Apache Parquet Columnar Storage compression Data Lakehouse Predicate Pushdown

Alex Merced

4/13/2026 • EN

What is Apache Iceberg? The Table Format Revolution

Explains Apache Iceberg, a table format that replaces directory-based metadata with file-level tracking for scalable analytics on cloud storage.

Apache Iceberg cloud storage Data Lakehouse Metadata Tree Table Format

Alex Merced

3/5/2026 • EN

How to Use Dremio with Claude CoWork: Connect, Query, and Build Data Apps

A guide to integrating Dremio's data lakehouse platform with Claude CoWork, enabling natural language queries, automated reporting, and data app development.

Claudecowork Data Lakehouse Dremio Query Federation Semantic Layer

Alex Merced

3/5/2026 • EN

How to Use Dremio with GitHub Copilot: Connect, Query, and Build Data Apps

A guide to integrating GitHub Copilot with Dremio's data platform to enable AI-assisted SQL generation, data pipeline creation, and application development.

Data Lakehouse Dremio Github Copilot mcp sql

Alex Merced

3/5/2026 • EN

How to Use Dremio with Gemini CLI: Connect, Query, and Build Data Apps

A guide to integrating Google's Gemini CLI with Dremio's data platform for querying, building data apps, and generating SQL using AI.

Data Applications Data Lakehouse Dremio Gemini CLI mcp server

Alex Merced

3/5/2026 • EN

How to Use Dremio with Cursor: Connect, Query, and Build Data Apps

A guide on integrating Dremio's data platform with the Cursor AI code editor to enable accurate SQL generation and data app development.

AI Code Editor Cursor Data Lakehouse Dremio SQL Integration

Alex Merced

3/5/2026 • EN

How to Use Dremio with Claude Code: Connect, Query, and Build Data Apps

A guide to connecting Dremio's data lakehouse platform with Claude Code, enabling the AI coding agent to query live data and build data applications.

Claude Code Data Lakehouse Dremio mcp server sql

Alex Merced

Data Lakehouse Articles

Preparing Your Data Lakehouse for the EU AI Act: Auditable Lineage and Data Provenance

Preparing Your Data Lakehouse for the EU AI Act: Auditable Lineage and Data Provenance Requirements

Multi-Engine Catalog Federation with Apache Polaris: Syncing Google Cloud, AWS, and Azure Metadata

Designing Private, Air-Gapped Data Lakehouses: Scaling Iceberg in Highly Secure, On-Premises Clouds

Apache Iceberg v4 Roadmap: Adaptive Metadata Trees, Single-File Commits, and the Delta Convergence

Modern Python Tooling for Apache Iceberg

Designing an Immutable Data Lakehouse: Best Practices for Iceberg Snapshot Expiration

Decoupling Storage and Compute in Apache Iceberg: A Deep Dive into Cost Optimization

The 2026 Unified Data Architecture: Reconciling Multi-Cloud Data Lakehouses

The Death of the Data Swamp: Establishing Governance in Your 2026 Data Lakehouse

Choosing the Right Iceberg Control Plane: Polaris vs. Unity Catalog vs. Cloud REST

An In-Depth Overview of the Apache Iceberg 1.11.0 Release

Apache Iceberg Metadata Tables: Querying the Internals

What is Apache Parquet? Columns, Encoding, and Performance

What is Apache Iceberg? The Table Format Revolution

How to Use Dremio with Claude CoWork: Connect, Query, and Build Data Apps

How to Use Dremio with GitHub Copilot: Connect, Query, and Build Data Apps

How to Use Dremio with Gemini CLI: Connect, Query, and Build Data Apps

How to Use Dremio with Cursor: Connect, Query, and Build Data Apps

How to Use Dremio with Claude Code: Connect, Query, and Build Data Apps

Select Language

We use cookies