Interesting links - March 2025
A monthly roundup of tech links covering DuckDB, Kafka, data visualization, and stream processing, with highlights and commentary.
A monthly roundup of tech links covering DuckDB, Kafka, data visualization, and stream processing, with highlights and commentary.
A guide to building a data pipeline using DuckDB, covering data ingestion, transformation, and analytics with real-world environmental data.
A guide to setting up and using Dremio's Auto-Ingest feature for automated, event-driven data loading into Apache Iceberg tables from cloud storage.
A technical guide on using Quarto and R to programmatically generate and render markdown content for election results websites.
A technical guide on using Apache Flink SQL to stream data from Apache Kafka into Apache Iceberg tables, including code examples.
A technical guide on configuring a data pipeline from Kafka to Elasticsearch using Logstash, including Docker setup and configuration examples.
An in-depth look at torchdata's internal architecture, focusing on datapipes and how they optimize data loading for PyTorch to improve GPU memory bandwidth.
Advises starting ML projects with simple heuristics and data analysis before implementing complex machine learning models, citing expert advice.
A behind-the-scenes look at designing and implementing a production machine learning system for a major hospital group, covering architecture and validation.
A guide to using AWS Step Functions for serverless data retrieval from 3rd party APIs, minimizing custom Lambda code.
A deep dive into Kafka Connect's Single Message Transforms (SMT), exploring their use for data manipulation within the pipeline.
Explains how to use Kafka Connect's ValueToKey and ExtractField Single Message Transforms to set message keys from data fields.
A case study on using Python to automate the collection, cleaning, and processing of gigabytes of historical weather data for analysis.
Explores using the Kafka Connect FilePulse connector to ingest and process XML data into Apache Kafka, including configuration and troubleshooting.
Explains how to build a real-time analysis pipeline for DynamoDB data using DynamoDB streams and AWS Lambda functions.
A list of 19 Apache Kafka-related technical sessions at Oracle OpenWorld, JavaOne, and Oak Table World 2017 conferences.
A practical example of using Apache Kafka to decouple data pipelines, enabling flexible data processing and replay capabilities.