Getting Started Using Hadoop, Part 1: Intro
A practical guide introducing Hadoop's ecosystem and setting up a proof-of-concept cluster on Amazon EC2 using Cloudera for big data processing.
Randy Zwitch is a software engineer specializing in Python and data engineering. His blog features detailed tutorials on building and optimizing Python tools like PyArrow with GPU/CUDA support, Docker workflows, and high-performance data processing.
96 articles from this blog
A practical guide introducing Hadoop's ecosystem and setting up a proof-of-concept cluster on Amazon EC2 using Cloudera for big data processing.
A guide to installing and using R on Amazon EC2 instances to overcome in-memory limitations for big data analysis.
A guide to automatically re-install R packages after upgrading to R 3.0, with OSX and Windows instructions.
A developer shares a humorous and insightful experience debugging an R package for the Adobe API, focusing on error handling and JSON parsing.
A tutorial on using R and the Google Analytics API to analyze and visualize '(not provided)' organic search data.
A tutorial video demonstrating how to execute SQL queries within the R programming language using the 'sqldf' package for data analysis.
A tutorial video demonstrating how to overlay histograms with normal curves, density curves, and secondary data series in R.
A screencast tutorial demonstrating how to get started with the R programming language, including installations of R, RStudio, Rcmdr, and rattle.
Introduces Rcmdr, a GUI for performing basic business statistics in R without coding, and explains its installation and usage.
An introduction to RStudio, an open-source IDE that enhances the R programming experience with a more user-friendly interface and features.
A retrospective on learning R vs. SAS, highlighting initial frustrations with R's complexity and package management compared to SAS's structured approach.
A critique of 20 non-actionable reports in Adobe Omniture/SiteCatalyst, focusing on mobile, technology, and visitor profile metrics.
Explains how customizing the default Omniture SiteCatalyst menu by business function improves user understanding and decision-making with analytics data.
Analyzing the impact of custom jQuery code on Google Analytics metrics like bounce rate and user engagement tracking.
A first look at Adobe Discover 3, analyzing its new dark interface, improved calendar, heterogeneous pathing, and table builder features for web analytics.
A guide on using Adobe SiteCatalyst's Target report to calculate and visualize year-over-year growth metrics like page views.