All About Parquet Part 05 - Compression Techniques in Parquet
Explores compression algorithms in Parquet files, comparing Snappy, Gzip, Brotli, Zstandard, and LZO for storage and performance.
Explores compression algorithms in Parquet files, comparing Snappy, Gzip, Brotli, Zstandard, and LZO for storage and performance.
A practical guide to reading and writing Parquet files in Python using PyArrow and FastParquet libraries.
Explores how metadata in Parquet files improves data efficiency and query performance, covering file, row group, and column-level metadata.
Explains encoding techniques in Parquet files, including dictionary, RLE, bit-packing, and delta encoding, to optimize storage and performance.
Explains how Parquet handles schema evolution, including adding/removing columns and changing data types, for data engineers.
Final guide in a series covering performance tuning and best practices for optimizing Apache Parquet files in big data workflows.
Explores the role of IT security and other risk professionals in advising businesses, arguing for a normative approach to extreme risks.
Explores why Parquet is the ideal columnar file format for optimizing storage and query performance in modern data lake and lakehouse architectures.
An introduction to Apache Parquet, a columnar storage file format for efficient data processing and analytics.
Explains Parquet's columnar storage model, detailing its efficiency for big data analytics through faster queries, better compression, and optimized aggregation.
Explains the hierarchical structure of Parquet files, detailing how pages, row groups, and columns optimize storage and query performance.
A guide to automating Azure monitoring and alert setup using PowerShell within Infrastructure as Code (IaC) deployments.
A technical guide comparing spatial patterns in continuous raster data for overlapping regions using R, focusing on NDVI data analysis.
Introduces App Buddy, a macOS utility for managing settings, backups, and permissions for the developer's other applications.
A practical guide to structuring Go projects, advocating for simplicity over rigid conventions and explaining when to use or avoid common directory patterns.
Security audit results for vdirsyncer reveal four minor findings, including file permissions and error handling issues, with fixes implemented.
A guide to creating a custom React hook for handling various keyboard shortcuts, including single keys, combinations, and sequences.
Using GitHub Actions to trigger Airflow DAGs for orchestrating data pipelines across Spark, Dremio, and Snowflake.
Explores using GitHub Actions for software development CI/CD and advanced data engineering tasks like ETL pipelines and data orchestration.
A guide on integrating Azure Resource Locks into Terraform deployments to prevent accidental deletion of cloud resources.