Improved Column Reader API, First Cut of Geospatial Support: Hardwood 1.0.0.CR1 Is Available
Hardwood 1.0.0.CR1 release with improved ColumnReader API, geospatial support for Parquet, and documentation overhaul.
Hardwood 1.0.0.CR1 release with improved ColumnReader API, geospatial support for Parquet, and documentation overhaul.
Hardwood 1.0.0.CR1 release: improved ColumnReader API for Parquet files, initial geospatial support, and documentation overhaul.
Hardwood 1.0.0.Beta2 release adds VARIANT support, interactive Parquet TUI, and performance improvements.
Hardwood 1.0.0.Beta2 release adds VARIANT support, interactive Parquet TUI, and performance improvements for Apache Parquet parsing.
Strategies for migrating data to Apache Iceberg, including in-place, full rewrite, and shadow migration with zero downtime.
Hardwood 1.0.0.Beta1 release: new S3 backend, predicate push-down, Avro bindings, CLI for Parquet parsing.
Hardwood 1.0.0.Beta1 release: new S3 backend, predicate push-down, Avro bindings, CLI for inspecting Parquet files.
A technical walkthrough of converting Canada's wind turbine database to Parquet format and analyzing it using DuckDB, QGIS, and command-line tools.
A technical walkthrough of converting the US Wind Turbine Database to Parquet format and analyzing it using tools like GDAL, DuckDB, and QGIS.
A technical walkthrough of converting the massive OpenBuildingMap dataset (2.7B buildings) into a columnar Parquet format for efficient cloud analysis.
Exploring the Layercake project's analysis-ready OpenStreetMap data in Parquet format, including setup and performance on a high-end workstation.
Analysis of a new global building dataset (2.75B structures), detailing the data processing, technical setup, and tools used for exploration.
Exploration of DuckDB and DuckLake as lightweight analytics tools, comparing them to traditional data lake architectures.
A guide on using the new ArcGIS Pro add-in to download and work with Overture Maps Foundation's global geospatial datasets via Parquet files and DuckDB.
An analysis of DuckLake, a new open table format and catalog specification for data engineering, comparing it to existing solutions like Iceberg and Delta Lake.
A tutorial on building a beginner-friendly Model Context Protocol (MCP) server in Python to connect Claude AI with local CSV and Parquet files.
Microsoft updates SQLPackage with preview support for Parquet files in Azure Blob Storage, enhancing data management and provisioning capabilities.
Explains encoding techniques in Parquet files, including dictionary, RLE, bit-packing, and delta encoding, to optimize storage and performance.
An introduction to Apache Parquet, a columnar storage file format for efficient data processing and analytics.
Explains Parquet's columnar storage model, detailing its efficiency for big data analytics through faster queries, better compression, and optimized aggregation.