Overengineering in ML - business life is not a Kaggle competition
Discusses the pitfalls of overengineering machine learning models for business, contrasting Kaggle's optimization goals with real-world value creation.
Edwin Thoen’s blog explores practical insights in data science, machine learning, and R programming. He writes about reproducible workflows, model management, data analysis, and strategies to avoid overengineering and improve project outcomes.
9 articles from this blog
Discusses the pitfalls of overengineering machine learning models for business, contrasting Kaggle's optimization goals with real-world value creation.
A data scientist shares their workflow using the {drake} R package to manage dependencies and ensure reproducibility in long-term machine learning projects.
A data scientist explores intellectual humility and reframing imposter syndrome as a learning alarm to improve professional well-being.
Explores the psychological reasons behind heated debates in data science, like R vs. Python, and why they are often unproductive.
Announcing padr v0.5.0, an R package update with new arguments for the `thicken` function to drop original datetime columns and handle tied observations.
A team shares lessons from a large ML project on organizing code, data, and collaboration using R packages and multi-user server setups.
A guide to using RStudio's Jobs feature to train multiple Bayesian models in parallel, improving efficiency on multi-core systems.
A data scientist shares strategies for managing and mitigating failure in data science projects, emphasizing risk analysis and realistic planning.
Explains why custom S3 methods in R fail and how to fix them by properly defining generic functions.