What I Do Before a Data Science Project to Ensure Success
A data scientist shares three essential pre-project tasks—the one-pager, time-box, and breakdown—to avoid common pitfalls and ensure project success.
A data scientist shares three essential pre-project tasks—the one-pager, time-box, and breakdown—to avoid common pitfalls and ensure project success.
Overview of new features in version 4.0 of the R survey package, focusing on improved contrast estimation and replicate handling.
Explains how to use Monte Carlo analysis for product development, using TweetDeck screen capacity as a practical example.
A review of the best #TidyTuesday data visualization submissions from 2019, highlighting creative and insightful uses of R and ggplot2.
A guide on using PowerShell and a matrix/spreadsheet approach to visualize and audit Active Directory group memberships for IT administration.
Tips for using Google BigQuery's public datasets while managing and minimizing query costs, including using the free tier and setting budgets.
A guide to common SQL mistakes and optimization opportunities for developers and data professionals, covering integer division, UNION vs UNION ALL, and query performance.
A statistical re-analysis of a published study on the mouse microbiome and autism, examining data and p-values from behavioral experiments.
A statistical analysis discussing the limitations of confidence intervals, using examples from small-area sampling to illustrate their weak properties.
A data scientist clarifies common misconceptions about the field, explaining that machine learning is only a small part of the job and advanced degrees aren't always required.
An analysis of user-created Sankey diagrams from Reddit, visualizing personal Tinder match data and dating outcomes.
A developer explores investigative journalism, drawing parallels between source control diffs and uncovering truth in legal documents and online comments.
A technical analysis of bus punctuality using Auckland Transport API data, with R code for data processing and visualization.
An experiment testing if players with feminine usernames receive different in-game chat comments than those with masculine names in Overwatch.
An article arguing that SQL is one of the most valuable and enduring technical skills across various roles like engineering and product management.
Analysis of JSHeroes 2019 conference CFP data, revealing submission patterns and workshop details for the JavaScript event.
A guide to six statistical methods (frequentist and Bayesian) for comparing group means, with R and Stan code examples.
A summary of a panel discussion on various data roles (data scientist, ML engineer, etc.), including key skills and career insights.
Announcing the completion of the open-source book 'Geocomputation with R', detailing its collaborative creation, purpose, and availability.
A guide on using the ELK Stack (Elasticsearch, Logstash, Kibana) to analyze and triage large-scale Nmap scan results for penetration testing and offensive security.