First Look at Reasoning From Scratch: Chapter 1
An introduction to reasoning in Large Language Models, covering concepts like chain-of-thought and methods to improve LLM reasoning abilities.
An introduction to reasoning in Large Language Models, covering concepts like chain-of-thought and methods to improve LLM reasoning abilities.
Explores four main approaches to building and enhancing reasoning capabilities in Large Language Models (LLMs) for complex tasks.
A technical guide on fine-tuning IBM's Granite3.1 AI model using Guided Reward Policy Optimization (GRPO) to enhance its reasoning capabilities.
A tutorial on reproducing DeepSeek R1's RL 'aha moment' using Group Relative Policy Optimization (GRPO) to train a model on the Countdown numbers game.
Explains the training of DeepSeek-R1, focusing on the Group Relative Policy Optimization (GRPO) reinforcement learning method.
Explores reward hacking in reinforcement learning, where AI agents exploit reward function flaws, and its critical impact on RLHF and language model alignment.
A technical review of April 2024's major open LLM releases (Mixtral, Llama 3, Phi-3, OpenELM) and a comparison of DPO vs PPO for LLM alignment.
A review and comparison of the latest open LLMs (Mixtral, Llama 3, Phi-3, OpenELM) and a study on DPO vs. PPO for LLM alignment.
Discusses strategies for continual pretraining of LLMs and evaluating reward models for RLHF, based on recent research papers.
Analysis of recent AI research papers on continued pretraining for LLMs and reward modeling for RLHF, with insights into model updates and alignment.
A critical analysis of GPT-4's capabilities, questioning the 'miracle' narrative and exploring the technical foundations behind its success.
A podcast interview discussing reinforcement learning applications, data science career paths, and productivity insights for tech professionals.
Explores bandit algorithms like ε-greedy, UCB, and Thompson Sampling to improve recommender systems by balancing exploration and exploitation.
Introduces permutation-invariant neural networks for RL agents, enabling robustness to shuffled, noisy, or incomplete sensory inputs.
Explores how reinforcement learning methods like bandits and policy-based approaches can improve recommendation systems by optimizing for long-term rewards.
An interview with AI researcher Joelle Pineau discussing her work in reinforcement learning, its applications, and advice for newcomers to the field.
Explains the concept of causally correct partial models for reinforcement learning in POMDPs, focusing on counterfactual policy evaluation.
An introductory chapter on machine learning and deep learning, covering core concepts, categories, and terminology from a university course.
An introductory chapter on machine learning and deep learning, covering core concepts, categories, and the shift from traditional programming.
Introduces HOMER, a new reinforcement learning algorithm that solves key problems like global exploration and decoding latent dynamics with provable guarantees.