Improved Column Reader API, First Cut of Geospatial Support: Hardwood 1.0.0.CR1 Is Available
Hardwood 1.0.0.CR1 release: improved ColumnReader API for Parquet files, initial geospatial support, and documentation overhaul.
Hardwood 1.0.0.CR1 release: improved ColumnReader API for Parquet files, initial geospatial support, and documentation overhaul.
Hardwood 1.0.0.CR1 release with improved ColumnReader API, geospatial support for Parquet, and documentation overhaul.
Explores modern single-node data engineering tools like DuckDB, DataFusion, Polars, and LakeSail built on Apache Arrow for high-performance analytics.
Explains how Apache Arrow eliminates the serialization tax by providing a standardized in-memory columnar format for fast data movement.
A 2025 year-in-review of key Apache data projects: Iceberg, Polaris, Parquet, and Arrow, detailing their major updates and future roadmap.
Argues that 'Stream vs. Batch' is a misleading dichotomy; the real distinction is between 'Push vs. Pull' semantics in data processing.
Argues that 'Streaming vs. Batch' is a misleading dichotomy; the real distinction is between push and pull data semantics in processing systems.
Explores Apache Iceberg, Arrow, and Polaris—three key technologies powering modern, high-performance data lakehouse platforms.
A tutorial on using PyArrow for data analytics in Python, covering core concepts, file I/O, and analytical operations.
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
ROAPI is an open-source API server built in Rust that automatically creates REST APIs from static data files like CSV, JSON, and Parquet.
A step-by-step guide to building the pyarrow Python library with CUDA support using Docker on Ubuntu for GPU data processing.
Explores GPU-based data science workflows using MapD (now OmniSci) for high-performance analytics and machine learning without data transfer bottlenecks.