Randy Zwitch 7/31/2013

Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob

Read Original

This article details a solution for classifying hundreds of millions of URLs by content without visiting them. It explains the limitations of local processing, introduces the concept of embarrassingly parallel problems, and provides a technical walkthrough using Python, the mrjob library, and Amazon Elastic MapReduce (EMR) to run the task on a Hadoop cluster for massive scalability.

Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week