DataKind Singapore’s Latest Project Accelerator
DataKind Singapore's Project Accelerator connects volunteer data scientists with nonprofits to solve data challenges, like analyzing water consumption data.
DataKind Singapore's Project Accelerator connects volunteer data scientists with nonprofits to solve data challenges, like analyzing water consumption data.
A scientist explains why Python is their preferred language for machine learning and data analysis, arguing for productivity over language wars.
Analyzes the historical and technical reasons behind R's controversial 'stringsAsFactors' default, explaining its origins and the problems it causes.
Critique of using Shapiro-Wilk normality tests on large, complex survey data like NHANES, explaining why it's statistically inappropriate.
Explains how to use SQL window functions and percentiles in Postgres for more meaningful data analysis than simple averages.
Interview with data scientist Jeroen Janssens about his background, work on data science at the command line, and his Data Science Toolbox project.
The article debunks common misinterpretations of the Dunning-Kruger effect by analyzing the original study's data and findings.
A tutorial explaining the internals of Principal Component Analysis (PCA) for dimensionality reduction in machine learning and data analysis.
A technical guide to Dixon's Q test for identifying outliers in small datasets, including its method, application, and criticisms.
A Python tutorial covering essential tools and techniques for machine learning, including data visualization, PCA, LDA, and classification.
A tutorial on using Python tools for machine learning, covering data loading, visualization, preprocessing, and classification with scikit-learn.
Article critiques a misleading report claiming no gender pay gap in tech, using evidence from the AAUW study to refute the claim.
A technical guide on using SQL window functions, specifically LAG, to calculate month-over-month revenue growth percentages for SaaS or recurring billing analysis.
An explanation of Microsoft Azure HDInsights, a managed Apache Hadoop service for processing big data on Azure.
A developer's side project to analyze PyPI download logs, extracting insights about Python versions, installers, and operating systems used by package consumers.
A developer shares their journey learning Python, including recommended courses, books, and IDEs, and their decision to take a university course.
Exploring function pointers in IDL (Interactive Data Language) for refactoring legacy scientific code, with insights into the language's syntax and quirks.
A comprehensive, curated list of Python programming resources for all skill levels, covering tutorials, libraries, frameworks, and best practices.
Explains how Bayesian A/B testing improves online headline optimization, overcoming challenges of traditional frequentist methods for faster, more accurate results.
A critique of common pitfalls and unproductive patterns in statistics research presentations, aimed at improving academic discourse.