American Wind Farms
A technical walkthrough of converting the US Wind Turbine Database to Parquet format and analyzing it using tools like GDAL, DuckDB, and QGIS.
A technical walkthrough of converting the US Wind Turbine Database to Parquet format and analyzing it using tools like GDAL, DuckDB, and QGIS.
A technical walkthrough of converting the massive OpenBuildingMap dataset (2.7B buildings) into a columnar Parquet format for efficient cloud analysis.
Exploring the GM-SEUS dataset of US solar farms using GIS tools like QGIS and DuckDB for spatial data analysis.
Explores building AI Agents as streaming SQL queries using platforms like Apache Flink for improved consistency, scalability, and developer experience.
Explores building AI Agents as streaming SQL queries using platforms like Apache Flink for improved consistency, scalability, and developer experience.
A technical guide on downloading and analyzing Canada's National Address Register (15.8M addresses) using Python, DuckDB, and QGIS to create settlement centroids.
A tutorial on building a beginner-friendly Model Context Protocol (MCP) server in Python to connect Claude AI with local CSV and Parquet files.
Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.
A developer documents the first steps in building a personalized content recommendation system using saved articles, text embeddings, and algorithms.
Introduces the 'leopards' Python library for filtering and aggregating lists, offering a lightweight alternative to pandas for basic data operations.
A technical guide on processing Overture Maps' global land cover dataset, focusing on extracting and analyzing Australia's data using DuckDB and QGIS.
Exploring Japan's building footprint data from the Flateau project, which converts 3D CityGML data into 2D Parquet files for analysis.
Analysis of a research paper detailing an AI model that extracted 281 million building footprints from satellite imagery across East Asia.
A technical analysis of Maxar's high-resolution global satellite imagery basemap, examining 60GB of data across 11 cities using GDAL, Python, and DuckDB.
A technical guide on downloading and analyzing free Synthetic Aperture Radar (SAR) satellite imagery from Umbra's open data program.
A benchmark comparison of several Python libraries for reading Excel files, focusing on speed, type handling, and correctness.
Announcing the official paper and electronic version of the 'Practical MongoDB Aggregations' book, published by Packt with new content.
A tutorial on using pipes and the .[] filter in jq, a command-line JSON processor, for data iteration and transformation.
A cleaned-up, de-interleaved transcript of text message exhibits from the Twitter v. Elon Musk lawsuit, presented for clarity.
Explains how to integrate Dask with Kubeflow to accelerate data preparation and ETL tasks in machine learning pipelines using distributed computing.