Exploring Data Operations with PySpark, Pandas, DuckDB, Polars, and DataFusion in a Python Notebook
A guide to performing data operations using PySpark, Pandas, DuckDB, Polars, and DataFusion within a pre-configured Docker environment.
A guide to performing data operations using PySpark, Pandas, DuckDB, Polars, and DataFusion within a pre-configured Docker environment.
A comparison of Polars and Pandas for data analysis in Python, focusing on Polars' API, performance benefits, and use cases.
A benchmark comparison of several Python libraries for reading Excel files, focusing on speed, type handling, and correctness.
A developer's rant about Excel's frustrating limitations, including the 31-character worksheet name limit and unexpected numeric storage quirks.
A warning about a subtle pandas groupby issue that can lead to incorrect data aggregation sums if missing values are not handled properly.
A guide to embedding source notebook metadata in Excel reports using Python's pandas and xlsxwriter to simplify tracking and refreshing analyses.
An overview of the Pandas library for data analysis, covering data reading, filtering, merging, and visualization.
A tutorial on building and scheduling a Python web scraper to run automatically using GitHub Actions, including emailing results.
A guide to using SQL for efficient data analysis, comparing performance with pandas and demonstrating advanced SQL techniques.
A guide to efficiently cleaning and standardizing text data in large datasets using Python's pandas library, with a practical example.
Explains the theory behind linear regression models, a fundamental machine learning algorithm for predicting continuous numerical values.
A case study on automating Excel file creation and email distribution using Python's Pandas and Outlook integration.
A review of Python tools and libraries for visualizing and interactively exploring Pandas DataFrames, comparing them to Excel's graphical interface.
A data scientist's 2020 review, focusing on machine learning projects for healthcare, including mining COVID-19 EHR data and brain signal analysis.
A guide to using pandas' groupby and aggregation functions for data analysis, covering basic to complex custom operations.
A guide to using pandas and openpyxl to read and clean poorly structured Excel files, focusing on the usecols and header parameters.
Explains the theory behind linear regression models, a fundamental machine learning technique for predicting continuous numerical values.
Explains the theory behind linear regression models, focusing on interpretability and use cases in fields like lending and medicine.
A guide to cleaning and processing messy CSV data using Python's Pandas library, including reading files and assigning custom headers.
A PyCon US 2018 talk on Python application monitoring basics, covering terminology, metrics, and integration using pandas.