Combining a survey and other data
Explores methods for statistical inference by combining survey data with other datasets, using examples from public health and rank tests.
Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.
215 articles from this blog
Explores methods for statistical inference by combining survey data with other datasets, using examples from public health and rank tests.
Explains how to use quasiquotation in base R to dynamically insert string values into code, specifically for model formulas.
Announcing the 2022 Ihaka Lectures, featuring online talks by Emi Tanaka, Luke Tierney, and Wes McKinney on R, data science tools, and experimental design.
Analyzing statistical tests for independence in survey contingency tables with zero cells, comparing methods like Rao-Scott and Wald tests in R.
Explores a bug in R's survey package when using strings vs. factors in grouped analysis, highlighting data type pitfalls.
Discusses the practical choices in setting up asymptotic models for statistics, using examples from clinical trials and big data.
Comparison of statistical tests (Wald, score, likelihood ratio, Rao-Scott) for generalized linear models in survey data, analyzing Type I error and power.
Explores optimal study design for raking/AIPW estimation, comparing it to IPW methods and analyzing efficiency trade-offs.
Explores efficient methods in R to test if matrices contain only binary (0/1) values, focusing on performance with large sparse matrices.
A statistical analysis of variance estimation for generalized linear models with crossed clustering, using old R code and sandwich estimators.
The author discusses the unexpected computational challenges of implementing score tests for generalized linear models in survey statistics.
Explores the mathematical and data science challenges of analyzing ordinal data, including tradeoffs in interpreting ordered scales and model limitations.
A critique of publishing code as images in academic papers, highlighting errors and reproducibility issues in statistical computing examples.
A developer analyzes Wellington's public transport data using GTFS feeds, comparing it to Auckland and building tools to track bus delays and cancellations.
Explains the correct and incorrect methods for analyzing subsets in survey data, focusing on statistical inference and standard error calculations.
Announces updates to the R survey package, including a major rewrite of svyquantile, performance improvements, and new features.
A mathematical critique of additive scoring in grading and grant reviews, arguing for non-additive monotone functions.
A technical analysis comparing traditional maps to hexmap visualizations for New Zealand housing affordability data using R and ggplot2.
Explores the distinction between using regression models for causal inference versus predictive inference, and the role of generalizability in prediction.
A technical article proposing a tidy data approach to matrix multiplication using R, comparing it to tensor operations and highlighting efficiency gains.