Rlhf Articles

Page 1 of 1 (4 articles)

4/19/2025 • EN

The State of Reinforcement Learning for LLM Reasoning

Analyzes the use of reinforcement learning to enhance reasoning capabilities in large language models (LLMs) like GPT-4.5 and o3.

LLM Reasoning Model Training Ppo Reinforcement Learning Rlhf

Sebastian Raschka

11/28/2024 • EN

Reward Hacking in Reinforcement Learning

Explores reward hacking in reinforcement learning, where AI agents exploit reward function flaws, and its critical impact on RLHF and language model alignment.

Alignment Language Models Reinforcement Learning Reward Hacking Rlhf

Lilian Weng

2/5/2024 • EN

Thinking about High-Quality Human Data

Explores the importance of high-quality human-annotated data for training AI models, covering task design, rater selection, and the wisdom of the crowd.

Data Quality Human Annotation LLM Alignment Machine Learning Rlhf

Lilian Weng

1/23/2024 • EN

RLHF in 2024 with DPO and Hugging Face

A technical guide on using Direct Preference Optimization (DPO) with Hugging Face's TRL library to align and improve open-source large language models in 2024.

Dpo Hugging Face llm Rlhf Trl

Philipp Schmid

Rlhf Articles

The State of Reinforcement Learning for LLM Reasoning

Reward Hacking in Reinforcement Learning

Thinking about High-Quality Human Data

RLHF in 2024 with DPO and Hugging Face

Select Language

We use cookies