11/28/2024
•
EN
Reward Hacking in Reinforcement Learning
Explores reward hacking in reinforcement learning, where AI agents exploit reward function flaws, and its critical impact on RLHF and language model alignment.