Randy Zwitch

Randy Zwitch is a software engineer specializing in Python and data engineering. His blog features detailed tutorials on building and optimizing Python tools like PyArrow with GPU/CUDA support, Docker workflows, and high-performance data processing.

https://randyzwitch.com

RSS Feed

2/1/2026

python pyarrow cuda docker data engineering

Articles from this Blog

96 articles from this blog

4/25/2014 • EN

Using SQL Workbench with Apache Hive

A tutorial on connecting to Apache Hive using the open-source SQL Workbench tool via JDBC, covering driver setup and connection configuration.

jdbc Hadoop Database Administration

3/10/2014 • EN

Real-time Reporting with the Adobe Analytics API

A guide to using the RSiteCatalyst R package to access Adobe Analytics real-time reporting API for monitoring metrics like orders and revenue.

API Integration Data Analytics R

2/4/2014 • EN

RSiteCatalyst Version 1.3 Release Notes

RSiteCatalyst v1.3 adds regex search, Realtime API support, and configurable request timing for the Adobe Analytics R package.

api data analysis R

1/12/2014 • EN

Getting Started With Hadoop, Final: Analysis Using Hive & Pig

Final tutorial on analyzing airline data with Hadoop using Hive for SQL queries and Pig for scripting, covering setup and basic analytics.

data analysis Big Data Hadoop

1/2/2014 • EN

Quickly Create Dummy Variables in a Data Frame

A tutorial on manually creating dummy variables in R to handle categorical data with many levels, addressing a common randomForest package error.

Dataframe R Categorical Data

12/9/2013 • EN

Adobe Analytics Implementation Documentation in 60 Seconds

A guide to automatically generate Adobe Analytics implementation documentation using the RSiteCatalyst R package and the Adobe API.

api Documentation Adobe Analytics

11/21/2013 • EN

Using Amazon EC2 with IPython Notebook

A guide to setting up a remote IPython Notebook server on Amazon EC2 for data science and analytics.

Python cloud computing Data Science

11/19/2013 • EN

Adding Line Numbers in IPython/Jupyter Notebooks

Two methods to add line numbers to cells in IPython/Jupyter Notebooks: a keyboard shortcut toggle and a permanent startup configuration.

JavaScript Ipython Jupyter

11/5/2013 • EN

RSiteCatalyst Version 1.2 Release Notes

RSiteCatalyst v1.2 is released on CRAN with bug fixes, dependency removal, and improved numeric type handling for the Adobe Analytics API.

api R Package

9/17/2013 • EN

Clustering Search Keywords Using K-Means Clustering

A technical guide on using K-Means clustering in R to analyze and segment search keywords for understanding user intent in digital analytics.

R Unsupervised Learning Text Analysis

9/2/2013 • EN

Fun With Just-In-Time Compiling: Julia, Python, R and pqR

A benchmark comparison of Julia, Python, R, and pqR on a Project Euler problem, exploring performance gains from JIT compilation.

Python julia Just In Time Compilation

8/25/2013 • EN

RSiteCatalyst Version 1.1 Release Notes

RSiteCatalyst 1.1 released with new API features, faster calls, and extended timeout for Adobe Analytics data in R.

api data analysis R

8/22/2013 • EN

Getting Started Using Hadoop, Part 4: Creating Tables With Hive

A tutorial on using Apache Hive to create tables and views from data loaded into a Hadoop cluster, continuing a multi-part series.

sql data processing Big Data

8/15/2013 • EN

Anomaly Detection Using The Adobe Analytics API

Explains how to use the Adobe Analytics API and R for statistical anomaly detection in time-series marketing data.

Time Series Anomaly Detection Exponential Smoothing

8/6/2013 • EN

Tabular Data I/O in Julia

A guide to reading and writing tabular data in Julia using arrays, DataFrames, and ODBC database connections.

julia tabular data Data Import

7/31/2013 • EN

Hadoop Streaming with Amazon Elastic MapReduce, Python and mrjob

A technical guide on using Python, mrjob, and Amazon EMR for Hadoop Streaming to perform large-scale, parallel URL classification.

Python Hadoop Streaming Amazon Elastic Mapreduce

7/23/2013 • EN

A Beginner's Look at Julia

An introduction to the Julia programming language for scientific computing, covering installation, package management, and basic syntax comparisons.

programming language git julia

5/22/2013 • EN

Getting Started Using Hadoop, Part 3: Loading Data

Tutorial on loading data into Hadoop's HDFS using the Hue File Browser interface and the Airline Dataset.

Hadoop Data Loading Cloudera

5/17/2013 • EN

Innovation Will Never Be At The Push Of A Button

Argues that true data science and innovation require deep mathematical understanding, not just push-button tools, and defends the value of skilled data scientists.

Machine Learning mathematics algorithms