ChatPaper.aiChatPaper

HiconAgent:面向图形用户界面智能体的历史上下文感知策略优化

HiconAgent: History Context-aware Policy Optimization for GUI Agents

December 1, 2025
作者: Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui Shao
cs.AI

摘要

图形用户界面智能体在执行序列导航任务时,需有效利用历史上下文信息。虽然引入过往动作与观察记录能提升决策质量,但直接使用完整历史会导致计算开销过大并引入无关信息干扰。为此,我们提出HiconAgent——一种通过历史上下文感知策略优化方法训练的GUI智能体,实现历史信息的高效精准利用。HCPO框架通过两个互补组件优化历史信息的使用:动态上下文采样在训练过程中呈现可变长度历史序列,使智能体能够自适应选择最相关上下文;锚点引导的历史压缩则采用双分支策略优化策略更新阶段,其中压缩分支在保留历史动作作为信息流锚点的同时移除历史观察记录。两个分支通过历史增强对齐损失进行耦合,在保证效率的同时保持历史使用的一致性。主流GUI导航基准测试表明,尽管模型规模更小,HiconAgent-3B在GUI-Odyssey数据集上以 grounding准确率提升8.46%、步骤成功率提升11.32%的表现超越GUI-R1-7B模型,同时在AndroidControl和AITW数据集上达到相当性能,并实现最高2.47倍的计算加速与60%的浮点运算量削减。
English
Graphical User Interface (GUI) agents require effective use of historical context to perform sequential navigation tasks. While incorporating past actions and observations can improve decision making, naive use of full history leads to excessive computational overhead and distraction from irrelevant information. To address this, we introduce HiconAgent, a GUI agent trained with History Context-aware Policy Optimization (HCPO) for efficient and effective utilization of historical information. HCPO optimizes history usage in both sampling and policy updates through two complementary components: (1) Dynamic Context Sampling (DCS) presents the agent with variable length histories during sampling, enabling adaptive use of the most relevant context; (2) Anchor-guided History Compression (AHC) refines the policy update phase with a dual branch strategy where the compressed branch removes history observations while keeping history actions as information flow anchors. The compressed and uncompressed branches are coupled through a history-enhanced alignment loss to enforce consistent history usage while maintaining efficiency. Experiments on mainstream GUI navigation benchmarks demonstrate strong performance. Despite being smaller, HiconAgent-3B outperforms GUI-R1-7B by +8.46 percent grounding accuracy and +11.32 percent step success rate on GUI-Odyssey, while achieving comparable results on AndroidControl and AITW with up to 2.47x computational speedup and 60 percent FLOPs reduction.
PDF31December 3, 2025