Single-Node Data Engineering: DuckDB, DataFusion, Polars, and LakeSail
Explores modern single-node data engineering tools like DuckDB, DataFusion, Polars, and LakeSail built on Apache Arrow for high-performance analytics.
Explores modern single-node data engineering tools like DuckDB, DataFusion, Polars, and LakeSail built on Apache Arrow for high-performance analytics.
Explores using DuckDB and Polars to query and write to Iceberg tables, covering new features, workflows, and practical patterns.
A curated list of interesting tech links for April 2026, covering data engineering, analytics, and AI integration.
Guide to using Apache Iceberg with Python libraries (PyIceberg, DuckDB, Polars) and MPP query engines like Dremio, Spark, and Trino.
A technical walkthrough of converting Canada's wind turbine database to Parquet format and analyzing it using DuckDB, QGIS, and command-line tools.
Analysis of Microsoft's 2026 Global ML Building Footprints dataset, including technical setup and data exploration using DuckDB and QGIS.
Analyzing All The Places' open-source location data project, detailing the technical setup and process for downloading and examining millions of brand locations.
A technical analysis and comparison of various administrative boundary datasets, including OpenStreetMap, using Python, DuckDB, and QGIS.
Analyzing Business Insider's dataset on US data center locations, ownership, and resource consumption using Python, DuckDB, and QGIS.
A technical walkthrough of converting the US Wind Turbine Database to Parquet format and analyzing it using tools like GDAL, DuckDB, and QGIS.
A guide on importing Java Flight Recorder (JFR) profiling data into DuckDB for analysis using SQL queries.
Exploring the GM-SEUS dataset of US solar farms using GIS tools like QGIS and DuckDB for spatial data analysis.
A technical exploration of the ICMM's global mining dataset, detailing the setup, tools, and process for data analysis using Python, DuckDB, and QGIS.
An analysis of Statistics Canada's Open Database of Buildings (ODB) dataset, covering data processing, tools used, and technical setup.
A monthly roundup of 78 curated links on data engineering, architecture, AI, and tech trends, with top picks highlighted.
A tutorial on using Positron's Connections Pane to connect to and query DuckDB databases efficiently, especially for handling large datasets.
Exploration of DuckDB and DuckLake as lightweight analytics tools, comparing them to traditional data lake architectures.
A monthly roundup of tech links covering data lakehouses (DuckLake, Iceberg), Kafka, event streaming, and stream processing developments.
A guide on using the new ArcGIS Pro add-in to download and work with Overture Maps Foundation's global geospatial datasets via Parquet files and DuckDB.
An analysis of DuckLake, a new open table format and catalog specification for data engineering, comparing it to existing solutions like Iceberg and Delta Lake.