Partitioning Practices in Apache Hive and Apache Iceberg
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
Table of Contents Context Introduction Short Version for Quick Readers My Journey with Table Formats and Lakehouses Ecosystem Over Features Key Takeaw
Explores the Data Lakehouse architecture and the roles of Apache Iceberg and Dremio in modern, integrated data management.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
Explores how Dremio's platform simplifies building and managing Apache Iceberg-based data lakehouses with governance, performance, and self-service.
Monthly roundup of data streaming trends, featuring Apache Iceberg, Kafka Streams, Flink deployments, and streaming SQL insights.
Explores Apache Iceberg and Project Nessie, key open-source technologies powering the flexible and vendor-neutral Open Lakehouse data architecture.
Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.
Explains the data lakehouse concept, Dremio's role as a platform, and Apache Iceberg's function as a table format for modern data architectures.
A guide to configuring Apache Spark for use with the Apache Iceberg table format, covering packages, flags, and programmatic setup.
Explores modern data engineering trends in 2022, focusing on analytical data storage formats, organization, and access patterns.
A hands-on tutorial for setting up a Docker environment to experiment with the Apache Iceberg table format using Spark SQL.