2025 highlights: AI research and code
A 2025 AI research review covering tabular machine learning, the societal impacts of AI scale, and open-source data-science tools.
Alex presents the work and impact of Gaël Varoquaux, a leading AI researcher at Inria, co-founder of scikit-learn, and expert in machine learning, data science, and public health.
81 articles from this blog
A 2025 AI research review covering tabular machine learning, the societal impacts of AI scale, and open-source data-science tools.
The author announces their new role as Probabl's CSO to accelerate development of the scikit-learn machine learning library and its ecosystem.
Introducing TabICL, a state-of-the-art table foundation model that uses in-context learning and improved architecture for fast, scalable tabular data prediction.
Explores how advanced AIs use 'chains of thought' reasoning to break complex problems into simpler steps, improving accuracy and performance.
A computer science academic reflects on academia's role in shaping societal narratives, especially around AI, through open technology and sober assessment.
Explores the critical challenge of bias in health AI data, why unbiased data is impossible, and the ethical implications for medical algorithms.
A researcher reflects on 2024 highlights in AI, covering societal impacts, software tools like Scikit-learn, and technical research on tabular data and language models.
Explores whether large language models like ChatGPT truly reason or merely recite memorized text from their training data, examining their logical capabilities.
Introduces CARTE, a foundation model for tabular data, explaining its architecture, pretraining on knowledge graphs, and results.
Announcing skrub 0.2.0, a library update simplifying machine learning on complex dataframes with new features like tabular_learner.
The article discusses the spin-off of scikit-learn's open-source development from Inria to a new mission-driven enterprise, Probabl, focusing on sustainable funding and growth.
Scikit-learn remains a dominant and impactful machine learning library, especially for classic ML and tabular data, despite the hype around deep learning.
A retrospective on forming a research team in 2022 to apply machine learning to challenges in health and social sciences, including data management and validation.
A former Mayavi core contributor shares their journey into open source, from using the 3D visualization library for PhD research to becoming a key maintainer.
Scikit-learn foundation seeks a community and partnerships developer to grow the open-source ecosystem and foster industry sponsorships.
A data scientist's 2020 review, focusing on machine learning projects for healthcare, including mining COVID-19 EHR data and brain signal analysis.
Tips for improving communication and reducing conflict in open-source software development, addressing maintainer anxiety and contributor fatigue.
Survey of experimental methods used by authors at NeurIPS 2019 and ICLR 2020, focusing on hyperparameter tuning, baselines, and reproducibility.
A researcher reviews their 2019 scientific work, focusing on computational statistics for brain imaging and data science.
Explores kernel methods and L1 distances for statistical two-sample testing, comparing their effectiveness in determining if datasets come from the same distribution.