1/17/2025
•
EN
Bite: How Deepseek R1 was trained
Explains the training of DeepSeek-R1, focusing on the Group Relative Policy Optimization (GRPO) reinforcement learning method.