All About Parquet Part 04 - Schema Evolution in Parquet
Explains how Parquet handles schema evolution, including adding/removing columns and changing data types, for data engineers.
Explains how Parquet handles schema evolution, including adding/removing columns and changing data types, for data engineers.
Explains encoding techniques in Parquet files, including dictionary, RLE, bit-packing, and delta encoding, to optimize storage and performance.
Explores compression algorithms in Parquet files, comparing Snappy, Gzip, Brotli, Zstandard, and LZO for storage and performance.
Explores how metadata in Parquet files improves data efficiency and query performance, covering file, row group, and column-level metadata.
A practical guide to reading and writing Parquet files in Python using PyArrow and FastParquet libraries.
Explores why Parquet is the ideal columnar file format for optimizing storage and query performance in modern data lake and lakehouse architectures.
Final guide in a series covering performance tuning and best practices for optimizing Apache Parquet files in big data workflows.
Explores the role of IT security and other risk professionals in advising businesses, arguing for a normative approach to extreme risks.
A guide to automating Azure monitoring and alert setup using PowerShell within Infrastructure as Code (IaC) deployments.
A technical guide comparing spatial patterns in continuous raster data for overlapping regions using R, focusing on NDVI data analysis.
Introduces App Buddy, a macOS utility for managing settings, backups, and permissions for the developer's other applications.
A guide to creating a custom React hook for handling various keyboard shortcuts, including single keys, combinations, and sequences.
Security audit results for vdirsyncer reveal four minor findings, including file permissions and error handling issues, with fixes implemented.
Using GitHub Actions to trigger Airflow DAGs for orchestrating data pipelines across Spark, Dremio, and Snowflake.
Explores using GitHub Actions for software development CI/CD and advanced data engineering tasks like ETL pipelines and data orchestration.
A guide on integrating Azure Resource Locks into Terraform deployments to prevent accidental deletion of cloud resources.
A practical guide to structuring Go projects, advocating for simplicity over rigid conventions and explaining when to use or avoid common directory patterns.
Podcast interview with Gorkem Ercan discussing Eclipse Foundation, AI/ML adoption in enterprises, CI/CD practices, and open source development.
Explores whether large language models like ChatGPT truly reason or merely recite memorized text from their training data, examining their logical capabilities.
Explores the future of PostgreSQL, focusing on the power of extensions like pg_stat_statements, Citus, and pg_search to add new capabilities.