Jake VanderPlas

Jake VanderPlas is an astronomer and open-source leader, serving as Director of Open Software at the University of Washington’s eScience Institute. He writes and builds widely used Python tools for data science, machine learning, and scientific computing.

https://jakevdp.github.io

RSS Feed

2/2/2026

python data science machine learning scientific computing open source

Articles from this Blog

66 articles from this blog

9/13/2018 • EN

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Explores the 'waiting time paradox' using probability, simulation, and real bus data to explain why average wait times often exceed the scheduled interval.

simulation data analysis statistics

12/18/2017 • EN

Simulating Chutes & Ladders in Python

A technical analysis of the Chutes & Ladders board game using Python simulation and Markov chain modeling to calculate expected game length.

Python simulation Probability

12/11/2017 • EN

Optimization of Scientific Code with Cython: Ising Model

A tutorial on using Cython to optimize slow numerical Python code, demonstrated with an Ising Model simulation.

Python performance optimization Scientific Computing

12/5/2017 • EN

Installing Python Packages from a Jupyter Notebook

A guide to installing Python packages in Jupyter Notebooks, explaining common issues with pip and conda, and how to ensure packages are available.

Python package management Pip

11/9/2017 • EN

Exploring Line Lengths in Python Packages

Analyzes line length distributions in popular Python packages, comparing them to Twitter's character limit analysis and exploring PEP8 style guide adherence.

Python code style Pep8

5/26/2017 • EN

Exposing Python 3.6's Private Dict Version

A technical deep dive into exposing and accessing Python 3.6's private dictionary version number using ctypes.

Python internals Ctypes

3/30/2017 • EN

A Practical Guide to the Lomb-Scargle Periodogram

A guide to the Lomb-Scargle periodogram, explaining its use, common misconceptions, and practical considerations for analyzing astronomical data.

Python data analysis Astronomy

3/22/2017 • EN

Group-by From Scratch

Explores implementing group-by operations from scratch in Python, comparing performance of Pandas, NumPy, and SciPy for data aggregation.

Python data analysis algorithm

3/8/2017 • EN

Triple Pendulum CHAOS!

A technical walkthrough of simulating a chaotic triple pendulum system in Python using Sympy and Kane's Method.

Python simulation Chao

3/3/2017 • EN

Reproducible Data Analysis in Jupyter

A video series on transitioning from interactive Jupyter data exploration to reproducible, packaged, and tested code for data analysis.

github git data analysis

8/25/2016 • EN

Conda: Myths and Misconceptions

A developer clarifies common myths about Conda, explaining it's a general-purpose package manager distinct from Anaconda and not just for Python.

Python package manager Anaconda

10/18/2015 • EN

Analyzing Pronto CycleShare Data with Python and Pandas

A tutorial on analyzing Seattle's Pronto CycleShare data using Python, Pandas, and the PyData stack for data science.

Python data visualization data analysis

8/14/2015 • EN

Out-of-Core Dataframes in Python: Dask and OpenStreetMap

A tutorial on using Dask for out-of-core data analysis with a large OpenStreetMap dataset, demonstrating scalable Python data manipulation.

Python Pandas Openstreetmap

8/7/2015 • EN

Frequentism and Bayesianism V: Model Selection

Compares frequentist and Bayesian approaches to statistical model selection, highlighting philosophical differences and computational trade-offs.

Python Model Selection Bayesian Statistics

7/23/2015 • EN

Learning Seattle's Work Habits from Bicycle Counts

Using Python and unsupervised machine learning to analyze Seattle bicycle count data and uncover insights about commuting work habits.

Python Pandas Scikit Learn

7/6/2015 • EN

The Model Complexity Myth

Debunks the myth that models can't have more parameters than data points, explaining how and when under-determined models can be solved and useful.

Statistical Modeling Bayesian Linear Models

6/13/2015 • EN

Fast Lomb-Scargle Periodograms in Python

A comparison of Python implementations for the Lomb-Scargle periodogram, recommending the fast algorithm in the gatspy package for analyzing irregularly-sampled data.

Python Lomb Scargle Periodogram

2/24/2015 • EN

Optimizing Python in the Real World: NumPy, Numba, and the NUFFT

A guide to optimizing a non-trivial algorithm (NUFFT) in Python using NumPy and Numba, comparing performance to a Fortran implementation.

Python optimization Numpy

11/12/2014 • EN

The Hipster Effect: An IPython Interactive Exploration

An interactive exploration using IPython to simulate and understand the mathematical model behind 'The Hipster Effect' paper on conformity and non-conformity.

simulation mathematical modeling Data Science

10/16/2014 • EN

How Bad Is Your Colormap?

A critique of the 'jet' colormap in data visualization, with a Python function to convert colormaps to grayscale for analysis.

data visualization Matplotlib Jet

1 2 3 4 Next

Jake VanderPlas

Articles from this Blog

Select Language