1/15/2026
•
EN
Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar
OpenAI researchers propose 'confessions' as a method to improve AI honesty by training models to self-report misbehavior in reinforcement learning.