Philipp Schmid 3/28/2026

How Kimi, Cursor, and Chroma Train Agentic Models with RL

Read Original

This article reviews three technical reports from Moonshot AI's Kimi K2.5, Cursor's Composer 2, and Chroma's Context-1, each using reinforcement learning to train agentic models. Kimi K2.5 introduces Agent Swarm for parallel task decomposition via RL. Cursor's Composer 2 uses self-summarization and real-time RL from production traffic. Chroma's Context-1 teaches self-editing context for document retrieval. All start from strong base models, use outcome-based rewards, and run asynchronous rollouts. The article details methodologies, infrastructure, and innovations like PARL for parallel agent orchestration.

How Kimi, Cursor, and Chroma Train Agentic Models with RL

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet