Sessionizing Log Data Using dplyr [Follow-up]
A technical tutorial on sessionizing log data using the dplyr package in R, comparing it to a previous SQL-based approach.
Randy Zwitch is a software engineer specializing in Python and data engineering. His blog features detailed tutorials on building and optimizing Python tools like PyArrow with GPU/CUDA support, Docker workflows, and high-performance data processing.
96 articles from this blog
A technical tutorial on sessionizing log data using the dplyr package in R, comparing it to a previous SQL-based approach.
A technical guide on using SQL window functions to group discrete time-series events into user sessions for data analysis.
RSiteCatalyst v1.4.3 release notes: bug fixes, new data feed monitoring functions, and internal prep for a new AdobeDW package.
A review of the book 'Data Science at the Command Line', highlighting its approach to data manipulation and analysis using command-line tools.
A guide to using Twitter.jl, a Julia package for interacting with the Twitter API, covering authentication and basic functions.
Release notes for RSiteCatalyst 1.4.2, an R package for Adobe Analytics, detailing bug fixes and a new API feature.
A critique of Excel's data handling flaws, showing how it silently corrupts timestamp data in CSV files upon saving.
A developer explains how to refactor repetitive Julia code for a Twitter API package using metaprogramming techniques to reduce lines and improve maintainability.
Release notes for RSiteCatalyst v1.4.1, detailing bug fixes and new API functions for Adobe Analytics reporting in R.
Evaluating Twitter's BreakoutDetection R package for time-series anomaly detection using real blog traffic data.
A technical guide on using R, RSiteCatalyst, and d3Network to create Sankey charts for visualizing website visitor pathing data from Adobe Analytics.
A tutorial on creating a stacked bar chart using Seaborn and Matplotlib by overlaying data series.
A tutorial on creating network graphs to visualize website page relationships using R, RSiteCatalyst, and d3Network packages.
RSiteCatalyst v1.4 is released with breaking changes, new Pathing/Fallout reports, OAuth support, and a cleaner codebase.
A tutorial on using the VennEuler.jl Julia package to recreate a data analytics language popularity diagram from survey data.
A tutorial on using Julia's string interpolation for automating repetitive data engineering tasks like querying multiple database tables.
A developer reflects on mastering multiple languages after encountering a tricky JSON parsing issue in R while maintaining a CRAN package.
A technical guide on using Julia to integrate data from Hadoop and Teradata Aster for visualization, demonstrating its role as a 'glue' language.
A data engineer shares five practical lessons and performance tips for working with Apache Hive, focusing on common pitfalls and optimizations.
Compares three methods for building JSON strings in R, discussing the pros and cons of each approach for developers.