72M Points of Interest
A technical analysis of Overture Maps' latest Places dataset, covering over 72 million global points of interest, including setup and tools used.
A technical analysis of Overture Maps' latest Places dataset, covering over 72 million global points of interest, including setup and tools used.
A blog post arguing that statistical inference is often used as a tool of rhetoric and persuasion, rather than pure objective science.
A data scientist shares workflow automation tools and custom settings for Positron, Raycast, and Espanso to streamline data analysis tasks.
Analysis of the most popular personal blogs on Hacker News in 2025, based on a tracking project that ranks domains by their performance on the platform.
Analyzing All The Places' open-source location data project, detailing the technical setup and process for downloading and examining millions of brand locations.
A technical analysis and comparison of various administrative boundary datasets, including OpenStreetMap, using Python, DuckDB, and QGIS.
Introduces dremioframe, a Python DataFrame library for querying Dremio with a pandas-like API, generating SQL under the hood.
Analyzing Business Insider's dataset on US data center locations, ownership, and resource consumption using Python, DuckDB, and QGIS.
A blog archive listing posts about data visualization, statistical analysis, and data science using the R programming language.
A lecture on the foundational statistical concept of orderings and ordinal data, exploring their analysis and complications in fields like health research.
The author discusses updates to gssrdoc, an R package that provides integrated help documentation for the General Social Survey (GSS) dataset.
Analyzing pedestrian fatality data using polar coordinate visualizations to reveal cyclical patterns in daily accident counts.
A technical exploration of the ICMM's global mining dataset, detailing the setup, tools, and process for data analysis using Python, DuckDB, and QGIS.
An analysis of Statistics Canada's Open Database of Buildings (ODB) dataset, covering data processing, tools used, and technical setup.
An analysis of Canada's new national building footprint dataset, exploring its sources, technical setup, and initial processing steps.
A statistical reasoning test with three practical problems on sorting uncertain fractions, highlighting anomalies, and estimating population sizes.
Argues that reading raw AI input/output data is essential for developing true intuition about system behavior, beyond just metrics.
Explains the statistical concept of included-variable bias in regression models, challenging the traditional 'omitted-variable bias' framing.
Argues that effective AI product evaluation requires a scientific, process-driven approach, not just adding LLM-as-judge tools.
A technical analysis using R to classify iris images from a dataset, applying PCA and LDA for machine learning classification.