5 Open Source Data Projects You Should Be Following
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
418 articles from this blog
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
Explains why Dremio is a top platform for Apache Iceberg lakehouses, highlighting features like dataset promotion and data reflections.
Explores Apache Iceberg's catalog system, its role in data lakehouse architecture, and key considerations for choosing the right catalog.
Explores 10 reasons to adopt Apache Iceberg and Dremio for building a modern, flexible, and cost-effective data lakehouse architecture.
Explains the role, types, and selection criteria for catalogs in Apache Iceberg, a key component for managing data lakehouse tables.
An introduction to ANSI SQL, covering its standardized syntax, key concepts like DDL, DML, joins, CTEs, and its importance for database interoperability.
Explains how ontologies structure data for better interoperability, integration, and analysis across domains like healthcare and finance.
An introductory guide to Python programming covering installation, syntax, data structures, and best practices for beginners.
A comprehensive guide to mastering the essential Git commands 'git pull' and 'git push', covering their anatomy, options, and best practices.
Explains the data lakehouse architecture and the roles of Apache Iceberg, Nessie, and Dremio in modern data management.
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
Compares columnar vs. row-based data structures, explaining their optimal use in OLAP and OLTP systems for performance and scalability.
A comprehensive guide to JavaScript Promises, covering basics, error handling, advanced methods like Promise.all(), and real-world use cases.
An introduction to Data Vault modeling, a flexible data warehouse design method using Hubs, Links, and Satellites for scalable data integration.
Table of Contents Context Introduction Short Version for Quick Readers My Journey with Table Formats and Lakehouses Ecosystem Over Features Key Takeaw
Explores the Data Lakehouse architecture and the roles of Apache Iceberg and Dremio in modern, integrated data management.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
A no-code tutorial on converting XLS/CSV files to Parquet format using Dremio, including setup via Docker.
An introduction to HTMX, a modern library for building dynamic web interfaces using HTML with minimal JavaScript, and how to use it.