Faster generalised linear models in largeish data
A method for faster generalized linear models on large datasets using a single database query and one Newton-Raphson iteration.
A method for faster generalized linear models on large datasets using a single database query and one Newton-Raphson iteration.
A tutorial on installing OmniSci (formerly MapD) using Docker and loading data for GPU-accelerated SQL analytics and visualization.
A technical guide on installing and configuring Oracle GoldenGate for Big Data with Kafka Connect and Confluent Platform.
A list of 19 Apache Kafka-related technical sessions at Oracle OpenWorld, JavaOne, and Oak Table World 2017 conferences.
Final summary of a project exploring ETL offload to Apache Spark on AWS EMR, evaluating cost and tech benefits for a cloud-based data platform.
A personal reflection on the trade-offs between convenience and privacy in an era of AI, IoT, and pervasive data collection.
Troubleshooting Oracle GoldenGate for Big Data Kafka Handler errors using logdump and debug logs.
Introduction to Apache Drill, a SQL engine for querying diverse data sources like files (CSV, JSON) and databases.
Guide on using R within Jupyter Notebooks to analyze and manipulate datasets in Oracle Big Data Discovery, enabling advanced statistical workflows.
Explains improvements in joblib's compressed persistence for Python, focusing on reduced memory usage and single-file storage for large numpy arrays.
Technical guide on building a real-time Twitter sentiment analysis system using Apache Kafka and Storm.
Explains Lambda Architecture for Big Data, combining batch processing (Hadoop) and real-time stream processing (Spark, Storm) to handle large datasets.
Explores using Apache Kafka to create flexible, testable data pipelines, enabling multiple parallel consumers and safe experimentation.
A tutorial on building data pipelines using Microsoft Azure Data Factory, covering ingestion, transformation, and orchestration.
Part 3 of a series on visualizing data in Kibana after loading it from Hadoop via Elasticsearch, comparing Kibana 3 and 4.
Introduction to using Kibana and Elasticsearch with Hadoop for visualizing and analyzing big data from web logs, tweets, and blog metadata.
A reflection on the challenges of data science in academia, discussing the 'brain drain' of data skills and the need for systemic change.
A data engineer shares five practical lessons and performance tips for working with Apache Hive, focusing on common pitfalls and optimizations.
Fixing MongoDB Connector for Hadoop authentication errors by granting the clusterManager role to the user.
An explanation of Microsoft Azure HDInsights, a managed Apache Hadoop service for processing big data on Azure.