Aligning mismatched Parquet schemas in DuckDB
How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.
Robin Moffatt is a Principal DevEx Engineer and seasoned conference speaker with 15+ years of experience presenting at top events like QCon, Devoxx, Kafka Summit, and Strata. He shares insights on developer experience, distributed systems, and cloud technologies through his blog, YouTube, and public talks.
617 articles from this blog
How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.
A developer advocate reflects on leaving Confluent, sharing lessons from six years in DevRel and announcing a move to a new role at LakeFS.
Explores the shift to ELT in data engineering, focusing on modern tools like dbt, Fivetran, and Airbyte for loading and transforming data.
A technical walkthrough of using dbt and DuckDB to clean and analyze session feedback data from a tech conference.
A hands-on exploration of using dbt (data build tool) with DuckDB for local data engineering, based on a tutorial project.
Analyzing conference session ratings using DuckDB and Jupyter Notebooks to demonstrate data wrangling and SQL on raw CSV data.
Explains the evolution from ETL to ELT in data engineering, clarifying the role of modern tools like dbt in the transformation process.
A hands-on tutorial exploring LakeFS for data versioning and branching using PySpark and Jupyter notebooks in a data engineering context.
A curated list of essential resources for data engineering, including articles, newsletters, podcasts, and tools.
Explores modern data engineering trends in 2022, focusing on analytical data storage formats, organization, and access patterns.
A data engineer explores the evolution of the data ecosystem, comparing past practices with modern tools and trends in 2022.
A workaround to customize the fields shown in Airtable's .ics calendar export, which by default only uses the primary field.
A behind-the-scenes look at how the program committee used data and tools to select talks for the Current 2022 and Kafka Summit tech conferences.
A guide on crafting effective abstracts for short, focused lightning talks at tech conferences, emphasizing clarity and a single core idea.
A program committee chair shares common mistakes in tech conference talk abstracts and provides tips for writing better submissions.
A developer advocate shares experiences and strategies for effective remote advocacy, covering virtual conferences, YouTube content, and remote engagement.
A guide to setting up automated Hugo blog draft previews using GitHub Actions and Surge.sh for collaborative review.
A developer shares essential and nice-to-have software tools for setting up a new Mac for productivity and development work.
A developer shares why Alfred App is an essential Mac productivity tool, highlighting features like clipboard history, file search, and workflows.
A guide to automating ksqlDB query deployments using bash scripts and REST endpoints, with examples for local and Confluent Cloud.