ChatPaper.aiChatPaper

CooperBench:为何编程助手尚难成为你的团队伙伴

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

January 19, 2026
作者: Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang
cs.AI

摘要

解决团队冲突不仅需要任务专项能力,更需运用社交智慧寻找共同立场并建立共识。随着AI智能体在复杂工作中日益频繁地协作,它们必须发展协调能力以成为高效队友。然而我们假设当前智能体尚不具备这种能力。为验证此假设,我们推出CooperBench——一个包含4种编程语言、12个类库中600余项协作编程任务的基准测试集。每个任务为两个智能体分配不同功能特性,这些特性可独立实现但若缺乏协调则可能产生冲突。所有任务均基于真实开源代码库,并配备专家编写的测试用例。通过对前沿编程智能体的评估,我们观察到"协调悖论"现象:与独立完成两项任务相比,智能体协作时的平均成功率降低30%。这与人类团队形成鲜明对比——增加成员通常能提升团队效能。分析揭示三大关键问题:(1)沟通渠道被模糊、不合时宜且不准确的信息阻塞;(2)即使存在有效沟通,智能体仍会偏离承诺;(3)智能体常对其他成员的计划和沟通产生错误预期。通过大规模模拟,我们还观察到罕见但有趣的涌现协调行为,包括角色分工、资源分配和协商机制。本研究为协作编程提供了新型基准测试框架,呼吁从追求个体智能体能力转向发展社交智能。
English
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.
PDF12January 29, 2026