Compressing a set of hash values
Explores compressing sets of hash values using Golomb-Rice coding, detailing the theory and implementation with examples.
Explores compressing sets of hash values using Golomb-Rice coding, detailing the theory and implementation with examples.
Explains Parquet's columnar storage model, detailing its efficiency for big data analytics through faster queries, better compression, and optimized aggregation.
Explores compression algorithms in Parquet files, comparing Snappy, Gzip, Brotli, Zstandard, and LZO for storage and performance.
Final guide in a series covering performance tuning and best practices for optimizing Apache Parquet files in big data workflows.
A tutorial on implementing a Huffman coding data compression utility in Haskell, focusing on constant memory usage and functional programming principles.
A technical guide on setting up Azure Event Hub to ingest and route compressed data into Azure Data Explorer (ADX) for real-time analytics.
Explains techniques for compressing and analyzing CS2 game demo files using Protocol Buffers and custom data structures for performance analysis.
A guide comparing popular data compression codecs (zstd, brotli, lz4, gzip, snappy) for Parquet files, explaining their trade-offs for big data.
A technical benchmark of the Hydrolix analytics platform on AWS, testing its performance on a 1.1 billion row NYC taxi dataset.
Explores Microsoft's new Columnstore compression estimation in SQL Server 2019, comparing it to a custom system stored procedure.
Introduces a custom stored procedure for estimating compression savings for SQL Server Columnstore Indexes, filling a gap in native tooling.
Columnstore Indexes are now available on the Standard Tier of Azure SQL Database, enabling better compression and performance for data warehousing.