Reinforcement Learning articles

2/13/2026 • EN

Nova - The AI Co-Designer That Learns Your Taste

Introduces Nova, an AI co-designer for board game creation that learns a designer's preferences and remembers past decisions through conversation.

AI Co Designer Conversational AI Game Design Generative Design Reinforcement Learning

Benny Cheung

1/22/2026 • EN

Induction and Feedback

A professor reflects on the intersection of machine learning and control theory, discussing the Learning for Dynamics and Control (L4DC) conference and the need for a merged perspective.

Control Theory Dynamical Systems Machine Learning optimization Reinforcement Learning

Ben Recht

1/15/2026 • EN

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

OpenAI researchers propose 'confessions' as a method to improve AI honesty by training models to self-report misbehavior in reinforcement learning.

AI Alignment Model Honesty Proxy Optimization Reinforcement Learning Reward Hacking

Simon Willison

1/15/2026 • EN

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

OpenAI researchers propose 'confessions' as a method to improve AI honesty by training models to self-report misbehavior in reinforcement learning.

AI Safety Model Honesty Openai Reinforcement Learning Reward Hacking

Simon Willison

12/30/2025 • EN

The State Of LLMs 2025: Progress, Problems, and Predictions

A 2025 year-in-review of Large Language Models, covering major developments in reasoning, architecture, costs, and predictions for 2026.

AI Research Deepseek llm Reasoning Models Reinforcement Learning

Sebastian Raschka

12/19/2025 • EN

2025 LLM Year in Review

A review of key paradigm shifts in Large Language Models (LLMs) in 2025, focusing on RLVR training and new conceptual models of AI intelligence.

AI Research Deepseek R1 LLM Training Reinforcement Learning Rlvr

Andrej Karpathy

12/17/2025 • EN

Frame by Frame

Explores the tension between optimization and systems-level thinking in AI-driven scientific discovery and computational ethics.

AI For Science optimization Reinforcement Learning Scientific Inquiry Systems Thinking

Ben Recht

12/7/2025 • EN

Agents, Context, and the Real Work of AI Development

A developer reflects on AI agent architectures, context management, and the industry's overemphasis on model development vs. building applications.

Agent Architectures ai development Context Windows Model Training Reinforcement Learning

Mark Tinderholt

12/5/2025 • EN

There's got to be a better way!

A critique of Reformist RL's inefficiency and a proposal for more effective alternatives in reinforcement learning.

Certainty Equivalence Machine Learning Policy Gradient Reformist Rl Reinforcement Learning

Ben Recht

12/3/2025 • EN

Defining Reinforcement Learning Down

A simplified, non-technical definition of reinforcement learning as an iterative optimization process based on external feedback.

artificial intelligence Feedback Machine Learning optimization Reinforcement Learning

Ben Recht

12/3/2025 • EN

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

A technical analysis of DeepSeek V3.2's architecture, sparse attention, and reinforcement learning updates, comparing it to other flagship AI models.

Deepseek LLM Architecture Open Weight Models Reinforcement Learning Sparse Attention

Sebastian Raschka

12/3/2025 • EN

A Technical Tour of the DeepSeek Models from V3 to V3.2

A technical analysis of the DeepSeek model series, from V3 to the latest V3.2, covering architecture, performance, and release timeline.

Deepseek llm Model Architecture Reinforcement Learning Sparse Attention

Sebastian Raschka

12/2/2025 • EN

Random Search for Random Search

A technical lecture on applying policy gradient methods to derive optimization algorithms, focusing on the unbiased gradient estimator and its applications.

Gradient Estimation Machine Learning Policy Gradient Reinforcement Learning Stochastic Optimization

Ben Recht

11/16/2025 • EN

Quoting Andrej Karpathy

Explores Andrej Karpathy's concept of Software 2.0, where AI writes programs through objectives and gradient descent, focusing on task verifiability.

ai programming Gradient Descent Neural Networks Reinforcement Learning Software 20

Simon Willison

9/6/2025 • EN

The post-training journey of modern LLMs revisited

Explores the shift from RLHF to RLVR for training LLMs, focusing on using objective, verifiable rewards to improve reasoning and accuracy.

AI Alignment llm Reasoning Models Reinforcement Learning Rlvr

Xavier Amatriain

7/1/2025 • EN

LLM Research Papers: The 2025 List (January to June)

A curated list of key LLM research papers from Jan-June 2025, organized by topic including reasoning models, RL methods, and efficient training.

LLM Research Machine Learning Multimodal Models Reasoning Models Reinforcement Learning

Sebastian Raschka

7/1/2025 • EN

LLM Research Papers: The 2025 List (January to June)

A curated list of key LLM research papers from the first half of 2025, organized by topic such as reasoning models and reinforcement learning.

artificial intelligence LLM Research Machine Learning Reasoning Models Reinforcement Learning

Sebastian Raschka

6/11/2025 • EN

🖨️ Why I'm Buying a Printer to Become a Better AI Engineer

An AI engineer explains how buying a printer for reading and annotating technical papers helps improve focus and retention.

AI Engineering learning techniques Paper Analysis Reinforcement Learning Technical Reading

Landon Gray

4/19/2025 • EN

The State of Reinforcement Learning for LLM Reasoning

Analyzes the use of reinforcement learning to enhance reasoning capabilities in large language models (LLMs) like GPT-4.5 and o3.

LLM Reasoning Model Training Ppo Reinforcement Learning Rlhf

Sebastian Raschka

4/19/2025 • EN

The State of Reinforcement Learning for LLM Reasoning

Explores the latest developments in using reinforcement learning to improve reasoning capabilities in large language models (LLMs).

LLM Reasoning Model Training Openai Ppo Reinforcement Learning

Sebastian Raschka

Reinforcement Learning Articles

Nova - The AI Co-Designer That Learns Your Taste

Induction and Feedback

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

The State Of LLMs 2025: Progress, Problems, and Predictions

2025 LLM Year in Review

Frame by Frame

Agents, Context, and the Real Work of AI Development

There's got to be a better way!

Defining Reinforcement Learning Down

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

A Technical Tour of the DeepSeek Models from V3 to V3.2

Random Search for Random Search

Quoting Andrej Karpathy

The post-training journey of modern LLMs revisited

LLM Research Papers: The 2025 List (January to June)

LLM Research Papers: The 2025 List (January to June)

🖨️ Why I'm Buying a Printer to Become a Better AI Engineer

The State of Reinforcement Learning for LLM Reasoning

The State of Reinforcement Learning for LLM Reasoning

Select Language