Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem
The author announces their new role as Probabl's CSO to accelerate development of the scikit-learn machine learning library and its ecosystem.
The author announces their new role as Probabl's CSO to accelerate development of the scikit-learn machine learning library and its ecosystem.
Explores performance optimizations for scikit-learn's GridSearchCV by using closed-form solutions and warm starts for specific linear models.
Testing a prompt technique inspired by 'The Office' to get more concise and detailed AI-generated explanations of technical concepts like Huber regression.
Explains the difference between AI and Machine Learning, with AI as the goal of intelligent systems and ML as a key approach to achieve it.
A researcher reflects on 2024 highlights in AI, covering societal impacts, software tools like Scikit-learn, and technical research on tabular data and language models.
Announcing skrub 0.2.0, a library update simplifying machine learning on complex dataframes with new features like tabular_learner.
The article discusses the spin-off of scikit-learn's open-source development from Inria to a new mission-driven enterprise, Probabl, focusing on sustainable funding and growth.
Explains data leakage in ML, why it's harmful, and how to prevent it when using pandas and scikit-learn for tasks like missing value imputation.
Explores the pros and cons of discretizing continuous features in machine learning, with a practical guide using scikit-learn's KBinsDiscretizer.
Scikit-learn remains a dominant and impactful machine learning library, especially for classic ML and tabular data, despite the hype around deep learning.
A retrospective on forming a research team in 2022 to apply machine learning to challenges in health and social sciences, including data management and validation.
Announcing a new book on machine learning, covering fundamentals with scikit-learn and deep learning with PyTorch, including neural networks from scratch.
Author announces a new machine learning book covering scikit-learn, deep learning with PyTorch, neural networks, and reinforcement learning.
A comprehensive collection of 90 machine learning lecture videos covering Python, scikit-learn, algorithms, and model evaluation techniques.
A comprehensive list of 90 machine learning lecture videos covering topics from Python basics to advanced ML concepts like decision trees and Bayesian methods.
Analyzes performance improvements and hardware scalability of the PairwiseDistancesArgKmin algorithm in scikit-learn's k-nearest neighbors implementation.
Introducing PairwiseDistancesReduction, a new Cython-based abstraction in scikit-learn for high-performance CPU computations of reductions over pairwise distances.
Analyzes performance bottlenecks in scikit-learn's k-nearest neighbors search and introduces a new implementation for better CPU scalability.
Analyzes performance limitations in scikit-learn due to CPython internals, memory hierarchy issues, and lack of low-level data structures.
Explains ongoing developer efforts to dramatically improve scikit-learn's performance, focusing on hardware scalability and algorithmic optimizations.