How Kimi, Cursor, and Chroma Train Agentic Models with RL
Read OriginalThis article reviews three technical reports from Moonshot AI's Kimi K2.5, Cursor's Composer 2, and Chroma's Context-1, each using reinforcement learning to train agentic models. Kimi K2.5 introduces Agent Swarm for parallel task decomposition via RL. Cursor's Composer 2 uses self-summarization and real-time RL from production traffic. Chroma's Context-1 teaches self-editing context for document retrieval. All start from strong base models, use outcome-based rewards, and run asynchronous rollouts. The article details methodologies, infrastructure, and innovations like PARL for parallel agent orchestration.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser
Top of the Week
No top articles yet