ChatPaper.aiChatPaper

Composer 2 技术报告

Composer 2 Technical Report

March 25, 2026
作者: Cursor Research, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair, Charlie Snell, Chen Lu, Chen Shen, Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans, Lee Danilek, Less Wright, Lujing Cen, Luke Melas-Kyriazi, Michael Truell, Michiel de Jong, Naman Jain, Nate Schmidt, Nathan Wang, Niklas Muennighoff, Oleg Rybkin, Paul Loh, Phillip Kravtsov, Rishabh Yadav, Sahil Shah, Sam Kottler, Alexander M Rush, Shengtong Zhang, Shomil Jain, Sriram Sankar, Stefan Heule, Stuart H. Sul, Sualeh Asif, Victor Rong, Wanqi Zhu, William Lin, Yuchen Wu, Yuri Volkov, Yury Zemlyanskiy, Zack Holbrook, Zhiyuan Zhang
cs.AI

摘要

Composer 2是专为智能体软件工程设计的专业化模型。该模型展现出强大的长期规划与编码智能,同时保持高效解决交互式使用问题的能力。该模型训练分为两个阶段:首先通过持续预训练提升模型知识储备与潜在编码能力,随后进行大规模强化学习,通过增强推理能力、精准的多步骤执行以及对长周期现实编码问题的连贯处理,全面提升端到端编码性能。我们开发了与部署模型所用Cursor框架相匹配的基础设施,配备同等工具与结构,并采用高度贴近实际问题的训练环境。为量化模型在渐进复杂任务中的能力,我们推出了基于大型代码库(包括自有代码库)真实软件工程问题的基准测试集。Composer 2作为前沿级编码模型,展示了训练强领域专业化模型的完整流程。在CursorBench评估中,该模型相较前代Composer模型(61.3分)实现精度重大提升。在公开基准测试中,该模型于Terminal-Bench获得61.7分,在SWE-bench多语言测试中取得73.7分(基于我们的测试框架),性能可比肩最先进系统。
English
Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems.
PDF41March 31, 2026