Robin Moffatt • 12/19/2016

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

This technical article details the process of moving locally-developed PySpark ETL code to run on Amazon Elastic MapReduce (EMR). It covers provisioning an EMR cluster, the challenges of software version mismatches and JAR conflicts, and the benefits of on-demand, scalable Hadoop processing in the cloud for data engineering workflows.

0 comments