Thomas Lumley

Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.

https://notstatschat.rbind.io

RSS Feed

1/25/2026

statistics data analysis statistical modeling applied mathematics research methods

Articles from this Blog

215 articles from this blog

1/17/2017 • EN

A bus-watching bot

A developer explains their Twitter bot that monitors and visualizes Auckland bus delays using GTFS data and R packages.

data visualization R Programming Gtf

1/12/2017 • EN

Mature and premature optimisation

A developer discusses the trade-offs between writing simple, clear code and optimizing for performance, using a real-world example of inefficient vector growth in R.

profiling performance optimization R Programming

1/9/2017 • EN

Fixing an infelicity in ‘leaps’

The author discusses a potential bug or design quirk in the 'leaps' R package related to how forward/backward selection interacts with its exhaustive search preprocessing.

Package Development Fortran R

1/3/2017 • EN

Learning the Monty Hall problem

Analyzing the Monty Hall problem, exploring learning strategies and optimal decisions based on observed game history and host behavior.

statistics Probability Bayesian Inference

12/30/2016 • EN

The ‘iris’ data

Critique of the classic iris dataset as a misleading example in modern machine learning education, exploring its original scientific purpose.

Machine Learning data visualization classification

9/27/2016 • EN

Large quadratic forms

Explores computational challenges of large quadratic forms in genomics, focusing on eigenvalue approximations for high-dimensional statistical tests like SKAT.

linear algebra statistics Genomics

9/6/2016 • EN

On permuting all the things

Using R code to generate permutations of digits (2,2,5,5,9,9), analyzing divisibility by 11 and primality.

statistics combinatorics prime numbers

8/27/2016 • EN

“The” multiple comparisons problem

Explores Bayesian vs. Frequentist approaches to the multiple comparisons problem in statistical inference and data analysis.

Data Science Multiple Comparisons Statistical Analysis

8/14/2016 • EN

Simulations and modes of convergence

Discusses why simulation summaries should focus on quantiles and robust statistics rather than moments when evaluating asymptotic approximations.

simulation statistics Asymptotics

7/28/2016 • EN

One scoRe years

The author reflects on R's rise in programming language rankings and its unexpected adoption across diverse fields over 20 years.

programming languages data analysis statistics

7/4/2016 • EN

How do we prove the Central Limit Theorem?

Explores various mathematical proofs for the Central Limit Theorem, comparing approaches like characteristic functions, the Lindeberg trick, entropy, and moments.

Probability Theory Central Limit Theorem Mathematical Statistics

6/4/2016 • EN

Computing the (simplest) sandwich estimator incrementally

Explains how to compute the Huber/White sandwich estimator incrementally in R's biglm package for large-scale linear regression.

statistics Linear Regression R

4/14/2016 • EN

Size matters

Explores why modern neural networks succeed where older ones failed, emphasizing the critical role of massive computational power and data size.

Machine Learning Neural Networks Deep Learning

4/10/2016 • EN

Sufficiently advanced technology

Explores the surprising science behind cheap gas-sensitive resistors and their ability to detect molecules like acetone, bridging chemistry and electronics.

Diy Electronics Gas Sensors Semiconductors

3/20/2016 • EN

The conservative Bonferroni correction

Explores the surprising effectiveness and conservative nature of the Bonferroni correction for multiple hypothesis testing, even with many tests.

statistics Confidence Intervals Bonferroni Correction

3/15/2016 • EN

Trace estimators and impact factors

Explores Hutchinson's randomized trace estimator for efficiently approximating the trace of large matrices, with practical improvements.

Statistical Computing Randomized Algorithms Trace Estimation

2/29/2016 • EN

Coding linear splines

Explains linear splines, their mathematical basis, and two practical parametrizations for regression, comparing them to higher-degree splines.

Regression Splines Linear Splines

2/5/2016 • EN

Stochastic SVD

Explains the Stochastic SVD algorithm, a probabilistic method for fast, approximate matrix decomposition using random projections.

linear algebra Numerical Methods Svd

1/20/2016 • EN

Is it that time of day?

A data analysis of a radio station's song rotation patterns using vector math and statistical methods to test anecdotal claims about repetitive playtimes.

data visualization data analysis statistics

1/13/2016 • EN

Another view of the ‘nearly true’ model

A statistical analysis comparing large and small model estimators, focusing on efficiency and misspecification testing in regression contexts.

Statistical Inference Asymptotic Theory Regression Analysis

Previous 1 ... 7 8 9 10 11 Next

Thomas Lumley

Articles from this Blog

Select Language