ChatPaper.aiChatPaper

合作缺口

The Collaboration Gap

November 4, 2025
作者: Tim R. Davidson, Adam Fourney, Saleema Amershi, Robert West, Eric Horvitz, Ece Kamar
cs.AI

摘要

人工智慧發展軌跡顯示,我們將日益依賴由獨立開發的智能體組成的多代理系統,這些智能體具備不同的資訊、權限與工具。此類系統的成功關鍵在於異質智能體間的有效協作,即使在部分可觀測條件下亦需如此。儘管學界高度關注,目前仍缺乏大規模評估此類智能體協作效能的實證研究。我們提出一個協作式迷宮求解基準測試,其特點在於:(i) 隔離協作能力評估、(ii) 可調控問題複雜度、(iii) 實現可擴展的自動化評分,以及 (iv) 不設輸出格式限制以保持生態效度。透過此框架,我們評估了32個領先的開源與閉源模型在單獨作業、同質配對及異質配對中的表現。研究結果揭示出「協作落差」現象:單獨表現優異的模型在需要協作時效能往往大幅下降。協作失靈可能極其嚴重,例如某些單獨解謎能力強的小型蒸餾模型,在特定配對組合中幾乎完全失效。我們發現由較強智能體主導協作能改善結果,據此提出「接力推理」方法——讓強智能體先主導任務再移交弱智能體,此舉可大幅縮小協作落差。本研究主張:(1) 建立具協作意識的評估體系、(2) 開發提升協作能力的訓練策略、(3) 設計能可靠激發智能體潛在技能的互動機制,這些指導原則同時適用於AI-AI及人-AI協作場景。
English
The trajectory of AI development suggests that we will increasingly rely on agent-based systems composed of independently developed agents with different information, privileges, and tools. The success of these systems will critically depend on effective collaboration among these heterogeneous agents, even under partial observability. Despite intense interest, few empirical studies have evaluated such agent-agent collaboration at scale. We propose a collaborative maze-solving benchmark that (i) isolates collaborative capabilities, (ii) modulates problem complexity, (iii) enables scalable automated grading, and (iv) imposes no output-format constraints, preserving ecological plausibility. Using this framework, we evaluate 32 leading open- and closed-source models in solo, homogeneous, and heterogeneous pairings. Our results reveal a "collaboration gap": models that perform well solo often degrade substantially when required to collaborate. Collaboration can break down dramatically; for instance, small distilled models that solve mazes well alone may fail almost completely in certain pairings. We find that starting with the stronger agent often improves outcomes, motivating a "relay inference" approach where the stronger agent leads before handing off to the weaker one, closing much of the gap. Our findings argue for (1) collaboration-aware evaluation, (2) training strategies developed to enhance collaborative capabilities, and (3) interaction design that reliably elicits agents' latent skills, guidance that applies to AI-AI and human-AI collaboration.
PDF212December 2, 2025