ChatPaper.aiChatPaper

CooperBench:为何编程助手尚难成为团队协作者

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

January 19, 2026
作者: Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang
cs.AI

摘要

解決團隊衝突不僅需要任務專項能力,更需具備尋找共同立場、建立共識的社交智能。隨著人工智能代理在複雜工作中日益頻繁地協作,它們必須發展協調能力才能成為高效團隊成員。然而我們假設當前代理尚不具備這些能力。為驗證此假設,我們推出CooperBench基準測試,包含4種編程語言中12個函式庫的600餘項協作編碼任務。每項任務為兩個代理分配可獨立實現但缺乏協調可能產生衝突的不同功能,所有任務均基於真實開源程式庫並配備專家編寫的測試用例。通過評估頂尖編碼代理,我們觀察到「協調悖論」現象:與獨立完成雙任務相比,代理協作時成功率平均降低30%。這與人類團隊中增加成員通常提升效率的現象形成鮮明對比。分析揭示三大關鍵問題:(1)溝通渠道充斥模糊、時機不當且不準確的訊息;(2)即使有效溝通後,代理仍會偏離承諾;(3)代理常對他人計劃與溝通持有錯誤預期。大規模模擬中我們還觀察到罕見但有趣的湧現協調行為,包括角色分工、資源分配和協商機制。本研究提出創新型協作編碼基準,呼籲從追求單體代理能力轉向發展社交智能。
English
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.
PDF12January 29, 2026