Philipp Schmid • 3/28/2026

How Kimi, Cursor, and Chroma Train Agentic Models with RL

This article reviews three technical reports from Moonshot AI's Kimi K2.5, Cursor's Composer 2, and Chroma's Context-1, each using reinforcement learning to train agentic models. Kimi K2.5 introduces Agent Swarm for parallel task decomposition via RL. Cursor's Composer 2 uses self-summarization and real-time RL from production traffic. Chroma's Context-1 teaches self-editing context for document retrieval. All start from strong base models, use outcome-based rewards, and run asynchronous rollouts. The article details methodologies, infrastructure, and innovations like PARL for parallel agent orchestration.

0 comments

#AI Agents #Reinforcement Learning #Agentic Models