Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob
Read OriginalThis article details a solution for classifying hundreds of millions of URLs by content without visiting them. It explains the limitations of local processing, introduces the concept of embarrassingly parallel problems, and provides a technical walkthrough using Python, the mrjob library, and Amazon Elastic MapReduce (EMR) to run the task on a Hadoop cluster for massive scalability.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser
Top of the Week
No top articles yet