Robin Moffatt 12/20/2016

ETL Offload with Spark and Amazon EMR - Part 4 - Analysing the Data

Read Original

This article, part 4 of a series, details the analysis phase of a Spark/EMR ETL project. It evaluates 'SQL-on-Hadoop' engines (e.g., Apache Drill, Hive, Presto) for querying data stored in open formats like Parquet on S3/HDFS. The analysis compares performance, ANSI SQL support, and operational complexity against traditional RDBMS, highlighting the flexibility of decoupled storage and compute.

ETL Offload with Spark and Amazon EMR - Part 4 - Analysing the Data

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet