Robin Moffatt 12/19/2016

ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

Read Original

This technical article details the process of moving locally-developed PySpark ETL code to run on Amazon Elastic MapReduce (EMR). It covers provisioning an EMR cluster, the challenges of software version mismatches and JAR conflicts, and the benefits of on-demand, scalable Hadoop processing in the cloud for data engineering workflows.

0 comments
ETL Offload with Spark and Amazon EMR - Part 3 - Running pySpark on EMR

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser