Gael Varoquaux

Alex presents the work and impact of Gaël Varoquaux, a leading AI researcher at Inria, co-founder of scikit-learn, and expert in machine learning, data science, and public health.

https://gael-varoquaux.info

RSS Feed

12/19/2025

machine learning data science artificial intelligence scikit-learn python

Articles from this Blog

81 articles from this blog

2/1/2026 • EN

2025 highlights: AI research and code

A 2025 AI research review covering tabular machine learning, the societal impacts of AI scale, and open-source data-science tools.

Machine Learning open source Data Science

1/14/2026 • EN

Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem

The author announces their new role as Probabl's CSO to accelerate development of the scikit-learn machine learning library and its ecosystem.

Machine Learning open source Data Science

7/9/2025 • EN

TabICL: Pretraining the best tabular learner

Introducing TabICL, a state-of-the-art table foundation model that uses in-context learning and improved architecture for fast, scalable tabular data prediction.

Transformer Pretraining Tabular Learning

6/20/2025 • EN

AIs that break down questions reason better

Explores how advanced AIs use 'chains of thought' reasoning to break complex problems into simpler steps, improving accuracy and performance.

artificial intelligence Language Models Deepseek

3/1/2025 • EN

Science must drive the narratives that shape society

A computer science academic reflects on academia's role in shaping societal narratives, especially around AI, through open technology and sober assessment.

computer science artificial intelligence Academia

2/13/2025 • EN

AI for health: the impossible necessity of unbiased data

Explores the critical challenge of bias in health AI data, why unbiased data is impossible, and the ethical implications for medical algorithms.

Machine Learning ai ethics Healthcare

1/1/2025 • EN

2024 highlights: of computer science and society

A researcher reflects on 2024 highlights in AI, covering societal impacts, software tools like Scikit-learn, and technical research on tabular data and language models.

Machine Learning large language models ai

10/19/2024 • EN

Do AIs reason or recite?

Explores whether large language models like ChatGPT truly reason or merely recite memorized text from their training data, examining their logical capabilities.

Machine Learning artificial intelligence large language models

7/19/2024 • EN

CARTE: toward table foundation models

Introduces CARTE, a foundation model for tabular data, explaining its architecture, pretraining on knowledge graphs, and results.

Relational Databases tabular data Deep Learning

7/3/2024 • EN

Skrub 0.2.0: tabular learning made easy

Announcing skrub 0.2.0, a library update simplifying machine learning on complex dataframes with new features like tabular_learner.

Python Machine Learning tabular data

6/9/2024 • EN

Promoting open-source, from inria to :probabl.

The article discusses the spin-off of scikit-learn's open-source development from Inria to a new mission-driven enterprise, Probabl, focusing on sustainable funding and growth.

Machine Learning open source Data Science

11/27/2023 • EN

People underestimate how impactful Scikit-learn continues to be

Scikit-learn remains a dominant and impactful machine learning library, especially for classic ML and tabular data, despite the hype around deep learning.

Machine Learning tabular data Scikit Learn

1/31/2023 • EN

2022, a new scientific adventure: machine learning for health and social sciences

A retrospective on forming a research team in 2022 to apply machine learning to challenges in health and social sciences, including data management and validation.

Machine Learning Data Science Scikit Learn

7/10/2022 • EN

My Mayavi story: discovering open source communities

A former Mayavi core contributor shares their journey into open source, from using the 3D visualization library for PhD research to becoming a key maintainer.

Python open source Mayavi

9/14/2021 • EN

Hiring someone to develop scikit-learn community and industry partners

Scikit-learn foundation seeks a community and partnerships developer to grow the open-source ecosystem and foster industry sponsorships.

open source community management Data Science

1/5/2021 • EN

2020: my scientific year in review

A data scientist's 2020 review, focusing on machine learning projects for healthcare, including mining COVID-19 EHR data and brain signal analysis.

Machine Learning sql Data Science

5/28/2020 • EN

Technical discussions are hard; a few tips

Tips for improving communication and reducing conflict in open-source software development, addressing maintainer anxiety and contributor fatigue.

software development open source Maintainer Burnout

1/22/2020 • EN

Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

Survey of experimental methods used by authors at NeurIPS 2019 and ICLR 2020, focusing on hyperparameter tuning, baselines, and reproducibility.

Machine Learning benchmarking Reproducibility

1/5/2020 • EN

2019: my scientific year in review

A researcher reviews their 2019 scientific work, focusing on computational statistics for brain imaging and data science.

Data Science Brain Imaging Kernel Methods

12/8/2019 • EN

Comparing distributions: Kernels estimate good representations, l1 distances give good tests

Explores kernel methods and L1 distances for statistical two-sample testing, comparing their effectiveness in determining if datasets come from the same distribution.

Kernel Methods Statistical Testing Two Sample Testing

1 2 3 4 5 Next

Gael Varoquaux

Articles from this Blog

Select Language