Designing an Immutable Data Lakehouse: Best Practices for Iceberg Snapshot Expiration
Best practices for managing Apache Iceberg snapshot expiration in data lakehouses to optimize query performance and metadata size.
Best practices for managing Apache Iceberg snapshot expiration in data lakehouses to optimize query performance and metadata size.
Explains designing an open catalog architecture for AI agents in an agentic lakehouse, covering Apache Polaris and Dremio's Open Catalog.
Explores using DuckDB and Polars to query and write to Iceberg tables, covering new features, workflows, and practical patterns.
Explores using Lance and Iceberg formats for multimodal AI data, addressing scan-heavy analytics vs. random-access retrieval for ML training.
Compares Apache Paimon and Iceberg for handling mutable streams, focusing on Paimon's LSM-tree architecture for high-frequency updates.
Explains how Apache Iceberg uses metadata for data skipping, enabling fast query performance by eliminating 90-99% of files before scanning.
Apache Polaris is an open-source catalog service that unifies the Iceberg ecosystem by implementing the Iceberg REST API for vendor-neutral lakehouse metadata management.
Explores two paths for building a universal lakehouse catalog that extends beyond Apache Iceberg tables to manage diverse data formats and sources.
A hands-on guide to using different catalogs, including Apache Hive, with Flink SQL, covering installation, configuration, and practical insights.