提问还是假设？编程智能体中不确定性感知的澄清机制

摘要

随着大语言模型（LLM）智能体在软件工程等开放式领域中的日益普及，它们频繁面临缺乏关键背景信息的未充分说明指令。虽然人类开发者能通过提出澄清性问题自然解决此类问题，但当前智能体主要被优化用于自主执行。本研究在SWE-bench Verified的未充分说明变体上，系统评估了LLM智能体的澄清寻求能力。我们提出一种不确定性感知的多智能体框架，将未充分说明性检测与代码执行显式解耦。实验结果表明，采用OpenHands+Claude Sonnet 4.5的多智能体系统实现了69.40%的任务解决率，显著优于标准单智能体设置（61.20%），并缩小了与处理完整说明指令的智能体之间的性能差距。此外，我们发现多智能体系统展现出良好校准的不确定性感知能力：对简单任务保持查询克制，而对复杂问题则主动寻求信息。这些发现表明，当前模型可转化为主动协作型智能体，使其能够在现实世界的未充分说明任务中自主识别何时需要提问以获取缺失信息。

English

As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that explicitly decouples underspecification detection from code execution. Our results demonstrate that this multi-agent system using OpenHands + Claude Sonnet 4.5 achieves a 69.40% task resolve rate, significantly outperforming a standard single-agent setup (61.20%) and closing the performance gap with agents operating on fully specified instructions. Furthermore, we find that the multi-agent system exhibits well-calibrated uncertainty, conserving queries on simple tasks while proactively seeking information on more complex issues. These findings indicate that current models can be turned into proactive collaborators, where agents independently recognize when to ask questions to elicit missing information in real-world, underspecified tasks.

提问还是假设？编程智能体中不确定性感知的澄清机制

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

摘要

Support