Statistics articles

9/21/2024 • EN

5 Books added to Big Book of R

The Big Book of R adds five new free, open-source books covering R programming for production, survey analysis, causal inference, biodiversity data, and natural resources.

book data analysis open source R Programming statistics

Oscar Baruffa

9/13/2024 • EN

Two approaches to approximating sums of chisquareds

Compares Satterthwaite, Liu, and leading-term approximations for tail probabilities of weighted sums of chi-squared variables in high-dimensional genomic data.

Approximation Methods Chi Squared Distribution Computational Methods Quadratic Forms statistics

Thomas Lumley

8/27/2024 • EN

The missing test in survey regression models

Explores missing likelihood-ratio tests in survey regression models, comparing Wald, score, and Rao-Scott tests with sample vs. population scaling.

Model Validation Regression statistics Survey Methods testing

Thomas Lumley

8/26/2024 • EN

Another way to not sample without replacement

Explores challenges and algorithms for weighted sampling without replacement in R, focusing on achieving specified marginal probabilities.

R Sampling Algorithms statistics Weighted Sampling Without Replacement

Thomas Lumley

7/9/2024 • EN

Covering all birthdays

Analyzing the probability of covering all birthdays in a group and the expected number of people needed, framed as the Coupon Collector's Problem.

algorithm Birthday Problem combinatorics Probability statistics

Lior Sinai

6/15/2024 • EN

Automatic transformation of standard errors?

Explores automatic delta-method transformations for variance estimates in R's survey package, enabling correct standard errors after mathematical operations.

Delta Method R statistics Survey Package Svystat

Thomas Lumley

6/9/2024 • EN

Why you shouldn’t use boxplots

Explains a crucial flaw in using boxplots for data visualization and suggests better alternatives.

Boxplots data visualization Ggplot2 R Programming statistics

Albert Rapp

4/29/2024 • EN

Another update on non-transitive dice

An update on the polymath research project about non-transitive dice and its statistical implications for the Wilcoxon/Mann-Whitney test.

mathematics Non Transitive Dice Probability statistics Wilcoxon Test

Thomas Lumley

4/14/2024 • EN

Assumptions

Discusses the nuanced role of assumptions in statistics, distinguishing between necessary and sufficient conditions, and their impact on interpreting models like linear regression.

Assumptions Linear Regression Mathematical Communication Ordinary Least Squares statistics

Thomas Lumley

3/28/2024 • EN

6 New books added to Big Book of R

Announces the addition of 6 new R programming books to the Big Book of R collection, covering statistics, machine learning, and data science.

Data Science Feature Engineering Machine Learning R Programming statistics

Oscar Baruffa

3/21/2024 • EN

Demystifying causal inference estimands: ATE, ATT, and ATU

Explains key causal inference estimands (ATE, ATT, ATU) and how to calculate them using observational data, with a focus on R and the potential outcomes framework.

Causal Inference Dag R statistics Tidyverse

Andrew Heiss

2/2/2024 • EN

Big Book of R at 400 [New milestone!]

The Big Book of R, a curated collection of free R programming books, celebrates a milestone of over 400 entries and requests community support for hosting costs.

book Data Science open source R Programming statistics

Oscar Baruffa

1/9/2024 • EN

Asymptotics for linear mixed models

Explores the asymptotic behavior of parameter estimates in linear mixed models, focusing on the loglikelihood as a quadratic form in Gaussian variables.

Asymptotic Theory Gaussian Processes Linear Mixed Models Maximum Likelihood statistics

Thomas Lumley

12/14/2023 • EN

How good is the leading eigenvalue approximation to quadratic forms?

Analyzes the accuracy of a leading eigenvalue approximation for quadratic forms in Gaussian variables, comparing it to traditional methods.

Eigenvalue Approximation Gaussian Variables Numerical Methods Quadratic Forms statistics

Thomas Lumley

12/12/2023 • EN

Why not REML?

Explains why the svylme package uses maximum likelihood instead of REML for survey-weighted linear mixed models, focusing on design and sampling constraints.

Mixed Models Reml statistics Svylme Variance Components

Thomas Lumley

12/4/2023 • EN

Sparse correlation and the Central Limit Theorem

Explores sparse correlation structures in statistical models and the conditions under which the Central Limit Theorem holds for dependent data.

Central Limit Theorem Gee Random Effects Sparse Correlation statistics

Thomas Lumley

11/24/2023 • EN

svy2lme: the preprint

Announcing a preprint for the svylme package, introducing the svy2lme function for fitting linear mixed models to complex survey data.

Complex Survey Data Linear Mixed Models R statistics Svylme

Thomas Lumley

8/15/2023 • EN

Manually generate predicted values for logistic regression with matrix multiplication in R

A guide to manually generating predicted values for logistic regression using matrix multiplication in R, as an alternative to the predict() function.

Logistic Regression Matrix Multiplication Predict R statistics

Andrew Heiss

8/12/2023 • EN

The ultimate practical guide to multilevel multinomial conjoint analysis with R

A technical guide to performing multilevel multinomial conjoint analysis using R, Bayesian modeling, and statistical packages.

Bayesian Conjoint Analysis Hierarchical Models R statistics

Andrew Heiss

7/30/2023 • EN

An optimal-stopping quant riddle

A detailed analysis of an optimal stopping problem involving drawing cards for reward, exploring mathematical strategies and first-principles reasoning.

algorithm Optimal Stopping Probability Quantitative Finance statistics

Emir U

Statistics Articles

5 Books added to Big Book of R

Two approaches to approximating sums of chisquareds

The missing test in survey regression models

Another way to not sample without replacement

Covering all birthdays

Automatic transformation of standard errors?

Why you shouldn’t use boxplots

Another update on non-transitive dice

Assumptions

6 New books added to Big Book of R

Demystifying causal inference estimands: ATE, ATT, and ATU

Big Book of R at 400 [New milestone!]

Asymptotics for linear mixed models

How good is the leading eigenvalue approximation to quadratic forms?

Why not REML?

Sparse correlation and the Central Limit Theorem

svy2lme: the preprint

Manually generate predicted values for logistic regression with matrix multiplication in R

The ultimate practical guide to multilevel multinomial conjoint analysis with R

An optimal-stopping quant riddle

Select Language