Linear regression in the wild
A data scientist shares a technical interview task on linear regression, covering data cleaning, model fitting, and assumption validation.
A data scientist shares a technical interview task on linear regression, covering data cleaning, model fitting, and assumption validation.
Explores computational challenges of large quadratic forms in genomics, focusing on eigenvalue approximations for high-dimensional statistical tests like SKAT.
Analyzing the relationship between age and desired job roles among new coders using the 2016 Kaggle survey data.
Using R code to generate permutations of digits (2,2,5,5,9,9), analyzing divisibility by 11 and primality.
Discusses why simulation summaries should focus on quantiles and robust statistics rather than moments when evaluating asymptotic approximations.
The author reflects on R's rise in programming language rankings and its unexpected adoption across diverse fields over 20 years.
Explains how to compute the Huber/White sandwich estimator incrementally in R's biglm package for large-scale linear regression.
Explores the surprising effectiveness and conservative nature of the Bonferroni correction for multiple hypothesis testing, even with many tests.
A guide for academics with math/physics backgrounds transitioning into data science, covering skills, learning paths, and practical advice.
A data analysis of a radio station's song rotation patterns using vector math and statistical methods to test anecdotal claims about repetitive playtimes.
Explores the statistical concept of 'design consistency' in survey sampling, comparing it to model consistency and discussing asymptotic theory.
Analyzing a classic probability problem involving dice rolls, its historical context with Newton and Pepys, and the mathematical intuition behind it.
Analyzes the pseudorandom number generator defined in NZ Flag Referendum law, comparing it to the Wichmann-Hill algorithm and noting a potential flaw.
Explores valid reasons for using simplified assumptions like 'spherical cows' in statistical modeling and theoretical work.
A technical critique of the Net Reclassification Index (NRI), a statistical measure for evaluating prediction model improvements, highlighting its surprising biases.
Critique of using Shapiro-Wilk normality tests on large, complex survey data like NHANES, explaining why it's statistically inappropriate.
Explores different proofs of the Continuous Mapping Theorem in probability theory, discussing their merits and pedagogical value.
The article debunks common misinterpretations of the Dunning-Kruger effect by analyzing the original study's data and findings.
A philosophical and technical exploration of the practical meaning of measurability in mathematical statistics, questioning its necessity for real-world data analysis.
A technical guide to Dixon's Q test for identifying outliers in small datasets, including its method, application, and criticisms.