Two-phase sampling notation
Proposes a standardized notation system for designing and analyzing two-phase sampling studies in statistical research.
Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.
215 articles from this blog
Proposes a standardized notation system for designing and analyzing two-phase sampling studies in statistical research.
A statistical analysis of multicollinearity in regression models, discussing its impact on coefficient interpretation and prediction.
Announcing the 2021 Ihaka Lectures featuring local experts on distributed computing, machine learning for child welfare, and applied math for COVID-19 response.
A professor outlines plans for a new undergraduate data management course covering data models, reproducible workflows, and tools like R, SAS, and Python.
Explains Neyman allocation for optimal stratified sampling and its exact integer solution, linking it to US Electoral College apportionment.
A programmer writes an interpreter for a subset of BASIC to run the original 1978 Oregon Trail game within R, discussing code translation challenges.
Introducing svyVGAM, a new R package for fitting complex survey regression models using the VGAM framework with design-based inference.
Explores the Bayesian equivalent of a two-sample t-test, questioning traditional assumptions and proposing a model using discrete distributions.
Explains the three main types of statistical weights (precision, frequency, sampling), their uses, and the software documentation challenges they create.
Overview of new features in version 4.0 of the R survey package, focusing on improved contrast estimation and replicate handling.
Explores the statistical challenges and potential bias when adjusting stratification variables during multi-wave sampling for population estimation.
A technical tutorial on mapping COVID-19 cases in New Zealand by District Health Board using R and the DHBins package.
A technical tutorial on implementing quadratic trend tests using the R survey package, including code examples and statistical analysis.
Announces version 3.37 of the R 'survey' package, detailing new features for statistical analysis with complex survey data.
A statistician's response to New Zealand's proposed Algorithms Charter, analyzing its principles for ethical and transparent government algorithm use.
A professor details the curriculum and practical challenges of teaching an undergraduate 'Data Science Practice' course, covering data prep, predictive models, and tools like R and keras.
A review of Janelle Shane's AI humor book, discussing neural network limitations and the real-world impact of class imbalance in machine learning.
A technical tutorial on creating hexagon maps (hexmaps) for visualizing New Zealand District Health Board data using the R programming language and the DHBins package.
A critique of the Oxford-Munich Code of Conduct for Data Scientists, focusing on its technical recommendations on sampling and data retention.
Explains why parentheses cause R code assignments to print their values, covering invisibility flags and the behavior of the `(` function.