Is popularity related to quality?
Analyzes if NPM package popularity correlates with quality using data from npms.io, finding it can be an indicator but not a guarantee.
Analyzes if NPM package popularity correlates with quality using data from npms.io, finding it can be an indicator but not a guarantee.
A developer's critique of Twitter's algorithmic Home feed, explaining why it failed to show relevant tech content and seeking alternatives.
An overview of the Pandas library for data analysis, covering data reading, filtering, merging, and visualization.
An introduction to Fisher Information, a statistical concept that quantifies how much information data samples contain about unknown distribution parameters.
A developer analyzes Wellington's public transport data using GTFS feeds, comparing it to Auckland and building tools to track bus delays and cancellations.
A tutorial on creating and customizing bar charts in ggplot2, focusing on adding percentage labels, custom colors, and improving accessibility.
A personal blog by an RStudio software engineer sharing findings, tips, and experiences with the R programming language and its ecosystem.
Explores how color choices in data visualizations evoke emotions and influence interpretation, using temperature charts as an example.
A data scientist shares her experience and contributions to the #30DayChartChallenge, a data visualization event, using various tools like ggplot2, PowerPoint, and Figma.
A product manager discusses five essential skills for product and leadership roles: SQL, Excel, clear communication, storytelling, and prioritization.
A guide to using SQL for efficient data analysis, comparing performance with pandas and demonstrating advanced SQL techniques.
A Kusto query snippet for Azure Log Analytics to filter records from the last 7 days, showing only entries between 9am and 5pm in local time.
A guide to efficiently cleaning and standardizing text data in large datasets using Python's pandas library, with a practical example.
A statistical analysis of multicollinearity in regression models, discussing its impact on coefficient interpretation and prediction.
A review of Python tools and libraries for visualizing and interactively exploring Pandas DataFrames, comparing them to Excel's graphical interface.
A guide to using pandas' groupby and aggregation functions for data analysis, covering basic to complex custom operations.
A programmer writes an interpreter for a subset of BASIC to run the original 1978 Oregon Trail game within R, discussing code translation challenges.
A guide to implementing a simple anomaly detection system using only SQL and basic statistics, aimed at developers.
Explores the Bayesian equivalent of a two-sample t-test, questioning traditional assumptions and proposing a model using discrete distributions.
Explains the three main types of statistical weights (precision, frequency, sampling), their uses, and the software documentation challenges they create.