More tests for survey data
Explores Rao-Scott tests for survey data analysis, comparing them to Wald tests and discussing new Satterthwaite-adjusted Wald tests in the survey package.
Thomas Lumley writes thoughtful, in-depth articles on statistics, data analysis, and statistical modeling. His blog explores topics like survey methods, regression, simulations, and inference with a rigorous yet reflective approach.
215 articles from this blog
Explores Rao-Scott tests for survey data analysis, comparing them to Wald tests and discussing new Satterthwaite-adjusted Wald tests in the survey package.
Critiques a statistics position paper for ignoring computing, software, and reproducibility in modern statistical science and faculty evaluation.
Explains a bug in R's tidyverse where !! and !!! quasiquotation breaks when parsed/deparsed, affecting debugging and function editing.
Explores statistical estimation for complex samples, focusing on design-weighted U-statistics and their Hoeffding projections for pair-based analyses.
Explores methods for computing tail probabilities of linear combinations of chi-squared variables, focusing on applications in genetics with large datasets.
Analyzing the probability of self-assignment in a Secret Santa gift exchange using probability bounds and simulations.
A mathematical exploration of bounds for the expected maximum of random variables, covering inequalities, norms, and chaining techniques for stochastic processes.
Explores Bayesian inference when data strongly contradicts prior expectations, analyzing how heavy-tailed priors and likelihoods affect posterior beliefs.
A developer shares the code and lessons learned from creating a Twitter bot that tweets summaries of Auckland bus system data.
A technical article exploring tail probability bounds for sums of random variables under 'sparse correlation' conditions, extending concepts like Bernstein's Inequality.
A technical discussion on asymptotic approximations in stratified sampling when sampling probabilities approach zero, relevant for rare disease studies.
A two-day workshop on survival analysis, covering data exploration, regression modeling, and practical sessions for time-to-event data.
Testing the limits of an R language detection package by finding English sentences it misclassifies and exploring algorithmic decision-making.
Exploring the 'srvyr' package for pipeable survey analysis in R and its integration with tidyverse conventions.
Explores a potential 'Polymath' project on the Wilcoxon test's non-transitive behavior with dice, connecting math and statistics.
A statistician argues that advanced math like calculus isn't a strict prerequisite for learning statistics, using personal experience and examples.
Analyzes efficiency differences between weighted and unweighted logistic regression in case-control studies, showing when ignoring weights is beneficial.
Announcing a public lecture series honoring statistician Ross Ihaka, featuring talks on statistical computing, data visualization, and data journalism.
Explores statistical scenarios where the bootstrap resampling method fails to provide accurate variance estimates or confidence intervals.
Explores defining and computing design-based pseudo-R-squared statistics for logistic regression models under complex survey sampling, like case-control designs.