The Basics of Compaction — Bin Packing Your Data for Efficiency
Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.
Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.
A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.
A hands-on tutorial for setting up a Docker environment to experiment with the Apache Iceberg table format using Spark SQL.
Guide on configuring an external Apache Hive metastore with Azure SQL for use in an Azure Synapse Analytics Spark Pool, troubleshooting common connection errors.
Practical strategies for staying current in the fast-moving field of machine learning, including project experimentation and community engagement.
Notes from Spark+AI Summit 2020 covering application-specific talks on ML frameworks, data engineering, feature stores, and data quality from companies like Airbnb and Netflix.
A data scientist reviews Martin Odersky's Functional Programming in Scala Coursera course, covering key learnings and its practical application.
A former PhD scientist shares his positive transition to data science freelancing, detailing the freedom and variety of his new career.
A deep-dive technical guide into Laravel Spark, an alpha-release tool for quickly building SaaS applications with Laravel.