Thomas Lumley

Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.

https://notstatschat.rbind.io

RSS Feed

1/25/2026

statistics data analysis statistical modeling applied mathematics research methods

Articles from this Blog

218 articles from this blog

6/16/2026 • EN

The simple case of almost sure representations

Explains a simple version of the almost-sure representation theorem in probability, using quantile functions and coupling.

Probability Coupling Random Variables

4/17/2026 • EN

Stage vs phase, again

Analysis of multistage vs multiphase surveys, comparing two-phase and single-phase weighting in nested case-control sampling.

simulation Survey Statistics Sampling Weights

4/13/2026 • EN

Are predictive models enough?

Explores whether predictive models alone suffice for causal inference, highlighting limitations and the need for causal theory.

Predictive Modeling Causal Inference Observational Data

1/26/2026 • EN

Do predictive models need to be causal?

Explores whether predictive statistical models require causal relationships to be useful, using examples from data sampling and real-world scenarios.

Machine Learning statistics Data Science

1/22/2026 • EN

Gauss is Not Mocked

A technical discussion comparing two classes of multiparameter tests in survey statistics, focusing on the Rao-Scott tests and intrinsically-weighted tests for regression models.

Statistical Inference Regression Analysis Survey Statistics

1/5/2026 • EN

Simulation and CLT

Explores a robust location estimator (Tukey's shorth) through simulation, examining its asymptotic normality and efficiency compared to the mean and median.

R Programming Central Limit Theorem Statistical Simulation

12/1/2025 • EN

Does svyglm use robust standard errors?

Explains that svyglm uses robust standard errors, detailing the statistical theory and variance estimation for survey data.

R Programming Statistical Computing Survey Data

12/1/2025 • EN

Horses or Zebras?

Discusses handling class imbalance in predictive modeling, using medical and zebra analogies to explain adjusting for prior probabilities and error costs.

Machine Learning statistics Data Science

10/30/2025 • EN

Laws and Orders

A lecture on the foundational statistical concept of orderings and ordinal data, exploring their analysis and complications in fields like health research.

data analysis statistics Statistical Methods

8/15/2025 • EN

Interviewing your laptop

Explores the limitations of using large language models as substitutes for human opinion polling, highlighting issues of representation and demographic weighting.

llm large language models ai ethics

7/22/2025 • EN

AIC and combined discrete/continuous models

Explains why AIC comparisons between discrete and continuous statistical models are invalid, using examples with binomial and Normal distributions.

Model Comparison R Statistical Modeling

6/19/2025 • EN

Included-variable bias

Explains the statistical concept of included-variable bias in regression models, challenging the traditional 'omitted-variable bias' framing.

data analysis statistics Bia

4/30/2025 • EN

Two-stage least squares

A technical explanation of the Two-Stage Least Squares (2SLS) method for causal inference in regression, covering its derivation and variance estimation.

Causal Inference Econometrics Regression Analysis

4/1/2025 • EN

Iris classification: the next generation

A technical analysis using R to classify iris images from a dataset, applying PCA and LDA for machine learning classification.

Machine Learning classification data analysis

3/4/2025 • EN

Coupling simulations and the "reparametrisation trick"

Explores techniques for generating identical random number streams across different statistical models, focusing on coupling simulations for Bayesian adaptive trials.

Statistical Computing Random Number Generation Monte Carlo Simulation

2/26/2025 • EN

Ordinal data: taking transformation invariance seriously

Explores the challenges of analyzing ordinal data, focusing on transformation invariance and the limitations of statistical comparisons.

statistics data transformation Ordinal Data

11/21/2024 • EN

Collinearity four more times

Analyzes four datasets with high collinearity between predictors, demonstrating statistical diagnostics and modeling approaches using R.

statistics R Regression

11/14/2024 • EN

Brute force and ignorance

A statistical analysis of a classic 1986 dataset, demonstrating how modern displays make hidden structures visible without complex methods.

data visualization R Statistical Graphics

9/13/2024 • EN

Two approaches to approximating sums of chisquareds

Compares Satterthwaite, Liu, and leading-term approximations for tail probabilities of weighted sums of chi-squared variables in high-dimensional genomic data.

statistics Computational Methods Quadratic Forms