Introduction to Data Engineering Concepts | Apache Iceberg, Arrow, and Polaris
Explores Apache Iceberg, Arrow, and Polaris—three key technologies powering modern, high-performance data lakehouse platforms.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
333 articles from this blog
Explores Apache Iceberg, Arrow, and Polaris—three key technologies powering modern, high-performance data lakehouse platforms.
Explains how Sampling and Prompts in the Model Context Protocol (MCP) enable smarter, safer, and more controlled AI agent workflows.
Explains how Tools in the Model Context Protocol (MCP) enable LLMs to execute actions like running commands or calling APIs, moving beyond just reading data.
Explains how the Model Context Protocol (MCP) uses 'Resources' to securely serve structured data from systems like files and databases to LLMs.
Explains the architecture of the Model Context Protocol (MCP), detailing its client-server model, core components, and message flow for connecting AI models to tools and data.
Explains the Model Context Protocol (MCP), an open standard for connecting AI agents and LLMs to external data sources and tools, enabling interoperability.
Explores AI agent frameworks, their benefits, limitations, and introduces the Model Context Protocol (MCP) for more modular AI systems.
Explores AI agents, their core components, differences from LLMs, and real-world applications, positioning them as the future of autonomous AI systems.
Explores three key methods to enhance LLM performance: fine-tuning, prompt engineering, and RAG, detailing their use cases and trade-offs.
Explains how LLMs work by converting words to numerical embeddings, using vector spaces for semantic understanding, and managing context windows.
Explores the evolution of AI from symbolic systems to modern Large Language Models (LLMs), detailing their capabilities and limitations.
A tutorial on building a beginner-friendly Model Context Protocol (MCP) server in Python to connect Claude AI with local CSV and Parquet files.
A guide to using Helm, the package manager for Kubernetes, covering Helm charts, installation, deployment, and best practices.
A guide to building AI applications using the LangChain framework, covering core concepts, installation, and practical examples.
A comprehensive 2025 guide to Apache Iceberg, covering its architecture, ecosystem, and practical use for data lakehouse management.
Explores solutions like Apache XTable and Delta Lake Uniform for enabling interoperability between different data lakehouse table formats.
Argues that RAG system failures stem from data engineering issues like fragmented data and governance, not from model or vector database choices.
A developer shares the story of building Pangolin, an open-source lakehouse catalog, using an AI coding agent during a holiday break.
A technical guide on designing and implementing a modern data lakehouse architecture using the Apache Iceberg table format in 2025.
A look at 10 upcoming features and enhancements for the Apache Iceberg data lakehouse table format, expected in 2025.