Thomas Lumley

Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.

https://notstatschat.rbind.io

RSS Feed

1/25/2026

statistics data analysis statistical modeling applied mathematics research methods

Articles from this Blog

215 articles from this blog

8/24/2023 • EN

Benchmark Archaeology

Analyzing a 1990s benchmark comparing R, S, and C performance, and revisiting it on modern hardware to discuss speed improvements and limitations.

performance benchmarking compilation

8/4/2023 • EN

Quoting and requoting

A technical guide on computing kurtosis and its standard error using R's survey package, including function creation and delta method application.

R Statistical Computing Kurtosis

6/7/2023 • EN

Blank-cheque inheritance and statistical methods objects

Discusses the conceptual problem of inheritance in object-oriented programming for statistical methods, using R's lm and glm classes as examples.

Inheritance R Object Oriented Design

5/5/2023 • EN

Pairwise likelihood and cluster sizes

A technical exploration of using pairwise likelihood in linear mixed models with complex sampling, comparing results from svylme and lme4 packages.

statistics R Linear Mixed Models

4/18/2023 • EN

Ranks in survey data

Explores the challenges of applying signed rank tests to complex survey data and proposes a design-independent rank transformation method.

statistics Survey Data Complex Sampling

3/27/2023 • EN

Class imbalance: bug or feature?

Explores the concept of class imbalance in machine learning, drawing parallels to medical training and questioning if it's a problem or an inherent feature.

Machine Learning Training Data Class Imbalance

3/7/2023 • EN

The fourth-root thing

A technical discussion on the 'fourth-root' condition for estimator consistency in statistical models like GEE, exploring asymptotic theory and nuisance parameters.

statistics Asymptotic Theory Estimation Theory

3/6/2023 • EN

Determinant of correlation matrix

A mathematical proof showing the determinant of a correlation matrix is at most 1, using eigenvalues and the AM-GM inequality.

linear algebra matrices determinant

2/21/2023 • EN

Sandwiches and aggregation

Analyzing the 'sandwich' package's behavior with aggregated count data in Poisson regression, comparing standard errors between individual and aggregate models.

Sandwich Estimator Robust Standard Errors Poisson Regression

2/10/2023 • EN

When is population mean rank a thing?

A statistical analysis article examining the Wilcoxon and Kruskal-Wallis rank tests, clarifying they compare population mean ranks, not medians.

Statistical Tests Nonparametric Statistics Rank Tests

2/9/2023 • EN

Checking proportionality of odds

Explains the proportional odds model for ordinal data, its assumptions, and discusses methods for testing the proportionality of odds.

Logistic Regression Statistical Modeling Ordinal Data

1/6/2023 • EN

Linkage and multiple imputation

Explores the intersection of multiple imputation and probabilistic record linkage, proposing a method to sample link sets for robust statistical analysis.

Data Science Statistical Modeling Bayesian Inference

12/9/2022 • EN

Pairwise and joint independence

Explains why pairwise independence of variables does not imply joint independence, using a chessboard as an intuitive counterexample.

mathematics statistics Independence

12/1/2022 • EN

The sandwich and the t-test

Explores the connection between the Welch-Satterthwaite t-test and linear regression using the sandwich variance estimator.

statistics Linear Regression Sandwich Estimator

11/5/2022 • EN

Bus pruning

Analysis of Auckland bus cancellations using R and GTFS data to visualize which trips are being removed from the timetable.

data analysis R Gtf

11/3/2022 • EN

Improving a graph

A technical walkthrough of visualizing and improving a graph of Auckland bus cancellation data using R, focusing on data representation and coding techniques.

github data visualization R

10/14/2022 • EN

Code archaeology: polynomial distributed lags

A technical article explaining polynomial distributed lag models for regularization in time-series analysis, including code archaeology and R implementation.

R Statistical Modeling Time Series