Thomas Lumley • 12/1/2025

Horses or Zebras?

The article explores the challenge of class imbalance in machine learning, where one outcome (e.g., Y=0) dominates the data. It argues that simply predicting the majority class is often correct, analogous to 'expecting horses, not zebras.' It details two key reasons to adjust models: when the real-world prior probability differs from the training data, and when the cost of false negatives outweighs false positives. Solutions like Bayesian prior adjustment and modifying the decision threshold in logistic regression are discussed.

0 comments

#Machine Learning #statistics #Data Science