HyperClick:通过不确定性校准提升GUI定位的可靠性
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration
October 31, 2025
作者: Shaojie Zhang, Pei Fu, Ruoceng Zhang, Jiahui Yang, Anan Du, Xiuwen Xi, Shaokang Wang, Ying Huang, Bin Qin, Zhenbo Luo, Jian Luan
cs.AI
摘要
自主图形用户界面(GUI)代理依赖精确的GUI定位技术——即将语言指令映射到屏幕坐标——来执行用户命令。然而,当前无论是通过监督微调(SFT)还是强化微调(RFT)训练的模型,都缺乏对自身能力边界的认知,导致过度自信和不可靠的预测。我们首先系统评估了通用模型与GUI专用模型中的概率化置信度与言语化置信度,揭示了置信度与实际准确性之间的错位问题。这种错位在动态GUI自动化任务中尤为关键,因为单个错误就可能导致任务失败。
为解决这一问题,我们提出HyperClick创新框架,通过不确定性校准来增强GUI定位的可靠性。该框架引入双重奖励机制:将正确动作的二元奖励与基于截断高斯分布的空间置信度建模相结合,并采用Brier分数进行校准。该方法联合优化定位准确性与置信度可靠性,促进内省式自我批判。在七大挑战基准上的大量实验表明,HyperClick在实现最先进性能的同时,能提供良好校准的置信度。通过实现显式置信度校准与内省式自我批判,HyperClick有效降低了过度自信问题,为GUI自动化提供了更可靠的支持。
English
Autonomous Graphical User Interface (GUI) agents rely on accurate GUI
grounding, which maps language instructions to on-screen coordinates, to
execute user commands. However, current models, whether trained via supervised
fine-tuning (SFT) or reinforcement fine-tuning (RFT), lack self-awareness of
their capability boundaries, leading to overconfidence and unreliable
predictions. We first systematically evaluate probabilistic and verbalized
confidence in general and GUI-specific models, revealing a misalignment between
confidence and actual accuracy, which is particularly critical in dynamic GUI
automation tasks, where single errors can cause task failure. To address this,
we propose HyperClick, a novel framework that enhances reliable GUI grounding
through uncertainty calibration. HyperClick introduces a dual reward mechanism,
combining a binary reward for correct actions with a truncated Gaussian-based
spatial confidence modeling, calibrated using the Brier score. This approach
jointly optimizes grounding accuracy and confidence reliability, fostering
introspective self-criticism. Extensive experiments on seven challenge
benchmarks show that HyperClick achieves state-of-the-art performance while
providing well-calibrated confidence. By enabling explicit confidence
calibration and introspective self-criticism, HyperClick reduces overconfidence
and supports more reliable GUI automation.