ETL Offload with Spark and Amazon EMR - Part 2 - Code development with Notebooks and Docker
Part 2 of a guide on developing ETL processes using Apache Spark, Jupyter Notebooks, and Docker on Amazon EMR.
Robin Moffatt is a Principal DevEx Engineer and seasoned conference speaker with 15+ years of experience presenting at top events like QCon, Devoxx, Kafka Summit, and Strata. He shares insights on developer experience, distributed systems, and cloud technologies through his blog, YouTube, and public talks.
625 articles from this blog
Part 2 of a guide on developing ETL processes using Apache Spark, Jupyter Notebooks, and Docker on Amazon EMR.
Explores using Apache Spark on Amazon EMR to offload and improve ETL processes, comparing it to traditional Oracle-based solutions.
Explains the importance of source control and automated deployment for OBIEE, detailing the 'why' and 'how' to prevent release issues.
Open-source Enhanced Usage Tracking for OBIEE now available, capturing detailed user interaction data for performance and analytics.
Rittman Mead announces open-source release of key BI/DI tools including a JavaScript API for OBIEE frontends and visual plugins.
Troubleshooting a Kafka Avro console producer error when registering a schema due to a port conflict, with a solution provided.
Troubleshooting guide for resolving Avro serialization errors when integrating Oracle GoldenGate with Kafka Connect.
How to fix a Java IncompatibleClassChangeError when running the Kafka Connect HDFS connector by unsetting the CLASSPATH.
Guide on configuring Oracle Data Visualization Desktop to connect with Google Analytics and Google Drive using Google Cloud APIs.
A tutorial on using Apache Drill to query and analyze JSON files with SQL, using blog analytics as a practical example.
Oracle's October 2016 security patches for OBIEE, Big Data Discovery, and ODI, detailing vulnerabilities and required actions.
Common boto/S3 error solutions: fixing SigV4 'host' parameter and region-specific endpoint issues in AWS Python SDK.
Guide to streaming Oracle database changes to Elasticsearch using Oracle GoldenGate and Kafka Connect for real-time data pipelines.
A technical appreciation of OBIEE's BI Server, highlighting its powerful data modeling, query handling, and federation capabilities.
A recap of the Polish Oracle User Group (POUG) conference, covering database, BI, and OBIEE performance topics.
A comprehensive guide to diagnosing and fixing OBIEE (Oracle Business Intelligence) performance issues, covering testing, caching, and optimization.
Troubleshooting Oracle GoldenGate for Big Data Kafka Handler errors using logdump and debug logs.
A technical guide on using native ODBC drivers and the strace tool to connect Apache Drill to OBIEE 12c on Linux for troubleshooting.
Introduction to Apache Drill, a SQL engine for querying diverse data sources like files (CSV, JSON) and databases.
A technical guide on configuring Apache Drill's ODBC driver to connect with Oracle Business Intelligence Enterprise Edition (OBIEE) 12c on a Linux environment.