5 Open Source Data Projects You Should Be Following
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
Explains why Dremio is a top platform for Apache Iceberg lakehouses, highlighting features like dataset promotion and data reflections.
Explores Apache Iceberg's catalog system, its role in data lakehouse architecture, and key considerations for choosing the right catalog.
Explores 10 reasons to adopt Apache Iceberg and Dremio for building a modern, flexible, and cost-effective data lakehouse architecture.
Explains the role, types, and selection criteria for catalogs in Apache Iceberg, a key component for managing data lakehouse tables.
Explains the data lakehouse architecture and the roles of Apache Iceberg, Nessie, and Dremio in modern data management.
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
Table of Contents Context Introduction Short Version for Quick Readers My Journey with Table Formats and Lakehouses Ecosystem Over Features Key Takeaw
Explores the Data Lakehouse architecture and the roles of Apache Iceberg and Dremio in modern, integrated data management.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
Explores how Dremio's platform simplifies building and managing Apache Iceberg-based data lakehouses with governance, performance, and self-service.
Explores Apache Iceberg and Project Nessie, key open-source technologies powering the flexible and vendor-neutral Open Lakehouse data architecture.
Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.
Explains the data lakehouse concept, Dremio's role as a platform, and Apache Iceberg's function as a table format for modern data architectures.
A guide to configuring Apache Spark for use with the Apache Iceberg table format, covering packages, flags, and programmatic setup.
A hands-on tutorial for setting up a Docker environment to experiment with the Apache Iceberg table format using Spark SQL.