Prioritizing the Long-Tail of Performance
Explains why focusing on median or average performance metrics is misleading and advocates for analyzing the long-tail of data to improve user experience.
Explains why focusing on median or average performance metrics is misleading and advocates for analyzing the long-tail of data to improve user experience.
Presentation slides for a Power BI tips and tricks talk at DataBISummit, available for download.
Discusses the proposal to lower p-value thresholds in statistical analysis, arguing it addresses symptoms not root causes of unreliable research.
Explains Chebyshev's inequality, a probability bound, and its application to calculating Upper Confidence Limits (UCL) in environmental monitoring.
Analyzes decision-making quality in sports and board games, where clear data reveals the high cost of poor choices.
A technical guide on using R's rvest package to scrape book descriptions and genres from Goodreads, adapting code from an existing project.
Part 2 of a series analyzing gender differences in dating dynamics, focusing on challenges and perspectives for nerds.
An analysis of Hacker News moderation tools and practices, based on data scraped from the site's API.
A roundup of blog posts and resources discussing various data analysis workflows and tools in the R programming language.
A technical guide on using the rgoodreads R package to analyze personal Goodreads reading data and critique the 5-star rating system.
A critique of data visualization choices in a KCSE exam analysis, comparing heat maps to line graphs for clarity.
A quick PowerShell script to count the frequency of first letters in a list of surnames from a text file.
A technical guide on analyzing personal Google Location History data using Python, Pandas, and visualization libraries to map and gain insights from location data.
Analyzing the relationship between age and desired job roles among new coders using the 2016 Kaggle survey data.
A technical guide on using Google BigQuery to analyze GitHub pull request data, including SQL queries for repository statistics.
The author reflects on R's rise in programming language rankings and its unexpected adoption across diverse fields over 20 years.
A curated list of insightful programming blogs covering topics like JVM internals, performance, ML, engineering culture, and computer architecture.
A data analysis of a radio station's song rotation patterns using vector math and statistical methods to test anecdotal claims about repetitive playtimes.
Analyzing a classic probability problem involving dice rolls, its historical context with Newton and Pepys, and the mathematical intuition behind it.
A data-driven critique of a popular Kenyan tech blog, analyzing its content focus using R programming and text mining techniques.