Getting Started with Data Analytics Using PyArrow in Python
A tutorial on using PyArrow for data analytics in Python, covering core concepts, file I/O, and analytical operations.
A tutorial on using PyArrow for data analytics in Python, covering core concepts, file I/O, and analytical operations.
Exploring Japan's building footprint data from the Flateau project, which converts 3D CityGML data into 2D Parquet files for analysis.
A benchmark analysis of DuckDB's performance on a massive 1.1 billion row NYC taxi dataset, comparing it to other database technologies.
A no-code tutorial on converting XLS/CSV files to Parquet format using Dremio, including setup via Docker.
A guide to building a cost-effective, high-performance, and self-service data lakehouse architecture, addressing common pitfalls and outlining key principles.
A guide comparing popular data compression codecs (zstd, brotli, lz4, gzip, snappy) for Parquet files, explaining their trade-offs for big data.
A quick guide on using DuckDB's SQL commands to efficiently convert CSV files to the Parquet format, including schema modifications.
How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.
Introduction to Apache Drill, a SQL engine for querying diverse data sources like files (CSV, JSON) and databases.