Submit Blog

Sign up Sign in

Search Articles

Filter by Tag

Sort By

Popular Tags

Spark Articles

Page 1 of 1 (14 articles)

Migrating to Apache Iceberg: Strategies for Every Source System

4/29/2026 • EN

Migrating to Apache Iceberg: Strategies for Every Source System

Strategies for migrating data to Apache Iceberg, including in-place, full rewrite, and shadow migration with zero downtime.

Apache Iceberg Data Migration Hive Tables Parquet Spark

The Basics of Compaction — Bin Packing Your Data for Efficiency

7/22/2025 • EN

The Basics of Compaction — Bin Packing Your Data for Efficiency

Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.

Apache Iceberg Bin Packing Data Compaction Data Optimization Spark

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

4/1/2024 • EN

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.

Apache Superset Data Engineering Data Lakehouse Dremio Spark

Brief Hands on Intro to Apache Iceberg

7/18/2022 • EN

Brief Hands on Intro to Apache Iceberg

A hands-on tutorial for setting up a Docker environment to experiment with the Apache Iceberg table format using Spark SQL.

Apache Iceberg Data Lakehouse docker Spark Table Formats

Use external Hive Metastore for Synapse Spark Pool

1/27/2022 • EN

Use external Hive Metastore for Synapse Spark Pool

Guide on configuring an external Apache Hive metastore with Azure SQL for use in an Azure Synapse Analytics Spark Pool, troubleshooting common connection errors.

Azure SQL Azure Synapse Data Engineering Hive Metastore Spark

Benjamin Perkins

How to Keep Learning about Machine Learning

1/19/2022 • EN

How to Keep Learning about Machine Learning

Practical strategies for staying current in the fast-moving field of machine learning, including project experimentation and community engagement.

data processing Deep Learning Machine Learning Pytorch Spark

My Notes From Spark+AI Summit 2020 (Application-Specific Talks)

7/5/2020 • EN

My Notes From Spark+AI Summit 2020 (Application-Specific Talks)

Notes from Spark+AI Summit 2020 covering application-specific talks on ML frameworks, data engineering, feature stores, and data quality from companies like Airbnb and Netflix.

Data Engineering Feature Engineering Machine Learning production Spark

ETL Offload with Spark and Amazon EMR - Part 4 - Analysing the Data

12/20/2016 • EN

ETL Offload with Spark and Amazon EMR - Part 4 - Analysing the Data

Explores SQL-on-Hadoop engines like Apache Drill for analyzing ETL data processed with Spark on Amazon EMR, focusing on performance and flexibility.

Amazon Emr data analysis Etl Spark SQL On Hadoop

ETL Offload with Spark and Amazon EMR - Part 5 - Summary

12/20/2016 • EN

ETL Offload with Spark and Amazon EMR - Part 5 - Summary

Final summary of a project exploring ETL offload to Apache Spark on AWS EMR, evaluating cost and tech benefits for a cloud-based data platform.

Amazon Emr aws Big Data Etl Spark

ETL Offload with Spark and Amazon EMR - Part 2 - Code development with Notebooks and Docker

12/16/2016 • EN

ETL Offload with Spark and Amazon EMR - Part 2 - Code development with Notebooks and Docker

Part 2 of a guide on developing ETL processes using Apache Spark, Jupyter Notebooks, and Docker on Amazon EMR.

Amazon Emr docker Etl Jupyter Spark

ETL Offload with Spark and Amazon EMR - Part 1 - Introduction

12/15/2016 • EN

ETL Offload with Spark and Amazon EMR - Part 1 - Introduction

Explores using Apache Spark on Amazon EMR to offload and improve ETL processes, comparing it to traditional Oracle-based solutions.

Amazon Emr aws data processing Etl Spark

Thoughts on Functional Programming in Scala Course (Coursera)

7/31/2016 • EN

Thoughts on Functional Programming in Scala Course (Coursera)

A data scientist reviews Martin Odersky's Functional Programming in Scala Coursera course, covering key learnings and its practical application.

Coursera functional programming recursion Scala Spark

What I do or: science to data science

12/14/2015 • EN

What I do or: science to data science

A former PhD scientist shares his positive transition to data science freelancing, detailing the freedom and variety of his new career.

D3j data visualization Machine Learning Scikit Learn Spark

Introducing Laravel Spark: A Deep Dive

9/17/2015 • EN

Introducing Laravel Spark: A Deep Dive

A deep-dive technical guide into Laravel Spark, an alpha-release tool for quickly building SaaS applications with Laravel.

authentication Laravel saas Spark Stripe