Edwin Thoen

Edwin Thoen’s blog explores practical insights in data science, machine learning, and R programming. He writes about reproducible workflows, model management, data analysis, and strategies to avoid overengineering and improve project outcomes.

https://edwinth.github.io

RSS Feed

2/1/2026

data science machine learning r programming reproducible workflows data analysis

Articles from this Blog

9 articles from this blog

10/14/2020 • EN

Overengineering in ML - business life is not a Kaggle competition

Discusses the pitfalls of overengineering machine learning models for business, contrasting Kaggle's optimization goals with real-world value creation.

Machine Learning Data Science Overengineering

5/14/2020 • EN

Using {drake} for Machine Learning

A data scientist shares their workflow using the {drake} R package to manage dependencies and ensure reproducibility in long-term machine learning projects.

Machine Learning workflow automation Reproducibility

1/7/2020 • EN

Some More Thoughts on Impostering

A data scientist explores intellectual humility and reframing imposter syndrome as a learning alarm to improve professional well-being.

Machine Learning mental health Imposter Syndrome

6/26/2019 • EN

The Psychology of Flame Wars

Explores the psychological reasons behind heated debates in data science, like R vs. Python, and why they are often unproductive.

Python programming languages Data Science

6/12/2019 • EN

padr is updated

Announcing padr v0.5.0, an R package update with new arguments for the `thicken` function to drop original datetime columns and handle tied observations.

datetime Data Manipulation R

3/18/2019 • EN

Code and Data in a large Machine Learning project

A team shares lessons from a large ML project on organizing code, data, and collaboration using R packages and multi-user server setups.

DevOps Machine Learning git

2/26/2019 • EN

Using Rstudio Jobs for training many models in parallel

A guide to using RStudio's Jobs feature to train multiple Bayesian models in parallel, improving efficiency on multi-core systems.

Job Parallel Computing Rstudio

11/22/2018 • EN

Dealing with failed projects

A data scientist shares strategies for managing and mitigating failure in data science projects, emphasizing risk analysis and realistic planning.

software development project management Data Science

6/15/2018 • EN

Why your S3 method isn’t working

Explains why custom S3 methods in R fail and how to fix them by properly defining generic functions.

debugging object oriented programming R

Edwin Thoen

Articles from this Blog

Select Language