What are packages for?
Explores the diverse reasons developers create R packages, from practical tools to experimental research, and discusses their varying lifespans.
Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.
215 articles from this blog
Explores the diverse reasons developers create R packages, from practical tools to experimental research, and discusses their varying lifespans.
A technical explanation of the svycontrast() function in R's survey package, covering linear and non-linear contrasts for statistical estimation.
Explores a fast algorithm for estimating principal components via subsampling, analyzing its application to genetics and statistical tests.
An update on the svy2lme R package for fitting linear mixed models with complex survey data, including a comparison with Stata.
Analysis of a bug in New Zealand's official pseudo-random number generator used for electoral vote counting, based on the Wichmann-Hill algorithm.
A tutorial replicating a Python experiment on creating a biased AI sentiment classifier, but using R, GloVe embeddings, and glmnet for logistic regression.
A critique of traditional statistics education, arguing for a more data-driven, question-focused approach using modern tools.
A data science tutorial using Leaflet to map Wellington bus locations and lateness, analyzing real-time transit data with R.
Explains statistical methods for testing random number generators in R, focusing on hypothesis testing and probability bounds.
Explores quoting, quasiquotation, and macros in R, comparing base-R and tidyverse approaches to metaprogramming.
Explains the naming and purpose of the R package 'reticulate', which provides a Python interface for R.
Introducing an R package for complex survey analysis using SQL databases via dplyr/dbplyr, with a focus on hexagonal binning algorithms.
Explores how software limitations in genetic analysis tools, like PLINK, hindered X-chromosome research in genome-wide association studies (GWAS).
A developer details migrating their blog from Tumblr to GitHub Pages using blogdown, including challenges with Python setup and MathJax.
A developer introduces an experimental R package for fitting linear mixed models to complex survey data, detailing its current capabilities and limitations.
Discusses the proposal to lower p-value thresholds in statistical analysis, arguing it addresses symptoms not root causes of unreliable research.
Explains Chebyshev's inequality, a probability bound, and its application to calculating Upper Confidence Limits (UCL) in environmental monitoring.
Explores using pairwise composite likelihood to fit mixed models when survey sampling and model random-effect structures differ, using genetic analysis as an example.
A method for faster generalized linear models on large datasets using a single database query and one Newton-Raphson iteration.
A technical article discussing debugging tricks for complex statistical models with symmetries, focusing on verification and small-sample testing.