ChatPaper.aiChatPaper

HyperClick:透過不確定性校準提升可靠GUI定位技術

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

October 31, 2025
作者: Shaojie Zhang, Pei Fu, Ruoceng Zhang, Jiahui Yang, Anan Du, Xiuwen Xi, Shaokang Wang, Ying Huang, Bin Qin, Zhenbo Luo, Jian Luan
cs.AI

摘要

自主圖形使用者介面(GUI)代理的運作依賴精確的GUI基礎定位技術——即將語言指令映射至螢幕座標以執行使用者命令。然而,當前無論透過監督式微調(SFT)或強化微調(RFT)訓練的模型,皆缺乏對自身能力邊界的認知,導致過度自信與不可靠的預測。我們首先系統性評估通用模型與GUI專用模型中的概率化信心與語言化信心,發現信心值與實際準確度存在錯位,此問題在動態GUI自動化任務中尤為關鍵,因為單次錯誤即可能導致任務失敗。為此,我們提出HyperClick框架,透過不確定性校準來增強GUI基礎定位的可靠性。該框架採用雙重獎勵機制,結合正確動作的二元獎勵與基於截斷高斯分布的空間信心建模,並以布萊爾分數進行校準。此方法能同步優化基礎定位準確度與信心可靠性,促進內省式自我批判。在七項挑戰基準上的廣泛實驗表明,HyperClick在實現最先進性能的同時,能提供良好校準的信心指標。透過實現顯性信心校準與內省式自我批判,HyperClick有效降低過度自信問題,為GUI自動化提供更高可靠性。
English
Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement fine-tuning (RFT), lack self-awareness of their capability boundaries, leading to overconfidence and unreliable predictions. We first systematically evaluate probabilistic and verbalized confidence in general and GUI-specific models, revealing a misalignment between confidence and actual accuracy, which is particularly critical in dynamic GUI automation tasks, where single errors can cause task failure. To address this, we propose HyperClick, a novel framework that enhances reliable GUI grounding through uncertainty calibration. HyperClick introduces a dual reward mechanism, combining a binary reward for correct actions with a truncated Gaussian-based spatial confidence modeling, calibrated using the Brier score. This approach jointly optimizes grounding accuracy and confidence reliability, fostering introspective self-criticism. Extensive experiments on seven challenge benchmarks show that HyperClick achieves state-of-the-art performance while providing well-calibrated confidence. By enabling explicit confidence calibration and introspective self-criticism, HyperClick reduces overconfidence and supports more reliable GUI automation.
PDF212February 7, 2026