What does ‘design-consistent’ even mean?
Explores the statistical concept of 'design consistency' in survey sampling, comparing it to model consistency and discussing asymptotic theory.
Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.
215 articles from this blog
Explores the statistical concept of 'design consistency' in survey sampling, comparing it to model consistency and discussing asymptotic theory.
Explores the complexities and efficiency trade-offs between weighted and unweighted logistic regression in case-control study designs.
Analyzing a classic probability problem involving dice rolls, its historical context with Newton and Pepys, and the mathematical intuition behind it.
Explains the statistical concept of 'double robust' estimation, where using two models for outcome and exposure improves reliability.
Analyzes the pseudorandom number generator defined in NZ Flag Referendum law, comparing it to the Wichmann-Hill algorithm and noting a potential flaw.
Explores valid reasons for using simplified assumptions like 'spherical cows' in statistical modeling and theoretical work.
Explores the 'curse of dimensionality' through simulations, showing how nearest-neighbor distances converge in high-dimensional spaces.
A technical critique of the Net Reclassification Index (NRI), a statistical measure for evaluating prediction model improvements, highlighting its surprising biases.
Analyzes the historical and technical reasons behind R's controversial 'stringsAsFactors' default, explaining its origins and the problems it causes.
Critique of using Shapiro-Wilk normality tests on large, complex survey data like NHANES, explaining why it's statistically inappropriate.
A tutorial on implementing Zero-Inflated Poisson models for complex survey data in R using the survey package.
Explores Hodges' estimator and extensions for asymptotic efficiency in statistical estimation, comparing them to the sample mean.
Explores different proofs of the Continuous Mapping Theorem in probability theory, discussing their merits and pedagogical value.
A philosophical and technical exploration of the practical meaning of measurability in mathematical statistics, questioning its necessity for real-world data analysis.
Explores the mathematical concept of transitive statistical tests and the conditions under which they can be represented by a single real-valued statistic.
Analyzes semiparametric efficiency in two-phase sampling designs, comparing estimators under correctly specified and 'nearly true' models.
Analyzes publication bias in scientific reporting using a humorous example of socks and Bayesian statistics.
Explains why specialized sinpi() functions exist for accurate computation of trigonometric functions at half-integer multiples, avoiding floating-point errors.
Explores the statistical power of monotonicity vs. smoothness assumptions in modeling, analyzing their asymptotic and finite-sample impacts.
Explores how a researcher's publication behavior influences the likelihood principle and statistical inference for other scientists.