Thomas Lumley

Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.

https://notstatschat.rbind.io

RSS Feed

1/25/2026

statistics data analysis statistical modeling applied mathematics research methods

Articles from this Blog

215 articles from this blog

12/17/2018 • EN

What are packages for?

Explores the diverse reasons developers create R packages, from practical tools to experimental research, and discusses their varying lifespans.

software development open source Statistical Computing

12/10/2018 • EN

svycontrast

A technical explanation of the svycontrast() function in R's survey package, covering linear and non-linear contrasts for statistical estimation.

R Statistical Computing Survey Analysis

11/26/2018 • EN

Finding principal components without even looking?

Explores a fast algorithm for estimating principal components via subsampling, analyzing its application to genetics and statistical tests.

Algorithm Complexity Singular Value Decomposition Principal Components

10/19/2018 • EN

Progress on svy2lme

An update on the svy2lme R package for fitting linear mixed models with complex survey data, including a comparison with Stata.

R Statistical Computing Linear Mixed Models

10/4/2018 • EN

The Kiwi PRNG

Analysis of a bug in New Zealand's official pseudo-random number generator used for electoral vote counting, based on the Wichmann-Hill algorithm.

statistics bug algorithm

9/27/2018 • EN

How to write a racist AI in R without really trying

A tutorial replicating a Python experiment on creating a biased AI sentiment classifier, but using R, GloVe embeddings, and glmnet for logistic regression.

NLP Glove Word Embeddings

8/28/2018 • EN

What can data science add to statistics education?

A critique of traditional statistics education, arguing for a more data-driven, question-focused approach using modern tools.

teaching methods pedagogy Data Science

8/14/2018 • EN

Leaflet and buses

A data science tutorial using Leaflet to map Wellington bus locations and lateness, analyzing real-time transit data with R.

api data visualization Rate Limiting

8/1/2018 • EN

Testing probability distribution generators

Explains statistical methods for testing random number generators in R, focusing on hypothesis testing and probability bounds.

Statistical Testing Probability Distributions Random Number Generation

7/30/2018 • EN

Quoting and macros in R

Explores quoting, quasiquotation, and macros in R, comparing base-R and tidyverse approaches to metaprogramming.

macros R Programming Quasiquotation

7/11/2018 • EN

Interlingual

Explains the naming and purpose of the R package 'reticulate', which provides a Python interface for R.

R Package Reticulate

6/9/2018 • EN

Statistical software matters

Explores how software limitations in genetic analysis tools, like PLINK, hindered X-chromosome research in genome-wide association studies (GWAS).

Genome Wide Association Studies Genetic Imputation X Chromosome Analysis

6/9/2018 • EN

Survey analysis in SQL

Introducing an R package for complex survey analysis using SQL databases via dplyr/dbplyr, with a focus on hexagonal binning algorithms.

sql Dplyr Statistical Analysis

6/5/2018 • EN

New blog home

A developer details migrating their blog from Tumblr to GitHub Pages using blogdown, including challenges with Python setup and MathJax.

Python github git

4/1/2018 • EN

svylme

A developer introduces an experimental R package for fitting linear mixed models to complex survey data, detailing its current capabilities and limitations.

R Statistical Modeling Mixed Models

3/23/2018 • EN

Small p hacking

Discusses the proposal to lower p-value thresholds in statistical analysis, arguing it addresses symptoms not root causes of unreliable research.

data analysis research methodology statistics

3/15/2018 • EN

Chebyshev’s inequality and `UCL’

Explains Chebyshev's inequality, a probability bound, and its application to calculating Upper Confidence Limits (UCL) in environmental monitoring.

data analysis statistics Confidence Intervals

3/13/2018 • EN

Why pairwise likelihood?

Explores using pairwise composite likelihood to fit mixed models when survey sampling and model random-effect structures differ, using genetic analysis as an example.

Statistical Modeling Survey Sampling Mixed Models