ChatPaper.aiChatPaper

AI,掌握方向:什麼驅動了人機協作問答系統中的委派與信任?

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

May 27, 2026
作者: Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig, Irene Ying, Tianyi Zhou, Jordan Boyd-Graber
cs.AI

摘要

AI系統並非完美無缺,人類在判斷是否該信任AI勝過自身判斷時也可能出錯。因此,提升人機協作需要理解人類在何時、為何以及如何決定依賴AI。我們研究兩種截然不同的依賴決策:委託選擇(決定何時讓AI在未知其輸出結果下自主行動)與採納選擇(評估AI建議並決定如何運用)。這兩種分離的依賴模式共同形塑協作,但先前研究極少在真實情境中針對同一群使用者同時探討兩者。為填補此缺口,我們透過人類-AI協作團隊在問答競賽中的互動進行研究——人類可自行決定何時及如何與AI智能體合作以爭取勝利。24場配對賽中,23位專家級人類與16個AI智能體組合,記錄了387次委託決策與1440次採納決策。結果顯示,人機協作表現雖優於純AI或純人類,但人類卻做出次優協作決策:既對正確AI建議依賴不足(錯失3.9%的機會),也對誤導性AI建議過度依賴(1.7%)。雙方均可能提供錯誤答案:當人類與AI意見分歧時,模型回報的信心值近乎隨機;而當AI建議與人類初始錯誤答案一致時,確認偏誤導致高達64.5%的依賴不足。為縮小此差距,我們建議採用校正信心值、基於證據的解釋機制,以及能協助使用者修正信任的互動設計。
English
AI systems are fallible, and humans can make mistakes in deciding whether to trust AI over their own judgment. Thus, improving human-AI collaboration requires understanding when, why, and how humans decide to rely on AI. We study two distinct reliance decisions: the delegation choice -- deciding when to let AI act autonomously without knowing its output, and the adoption choice -- evaluating AI suggestions and deciding how to use them. Both of these decoupled reliance patterns shape collaboration, but prior work rarely studies them together in realistic settings with the same users. We address this gap by studying collaborative human--AI teams competing in a question-answering game in which humans can choose when and how to work with AI agents to win. Our 24 matches pair 23 expert humans with 16 AI agents, capturing 387 delegation and 1440 adoption decisions. While human--AI collaboration performs better than either AI or humans alone, humans make suboptimal collaboration decisions, both under-relying on correct AI suggestions (3.9% of opportunities missed) and over-relying when AI misleads them (1.7%). Both parties contribute wrong answers: reported model confidence is near chance when humans and AI disagree, while confirmation bias drives higher under-reliance (64.5%) when an AI suggestion agrees with humans' initial incorrect answer. To close this gap, we recommend calibrated confidence, evidence-grounded explanations, and mechanisms that help users refine trust.