ChatPaper.aiChatPaper

HiconAgent:面向图形用户界面代理的历史上下文感知策略优化

HiconAgent: History Context-aware Policy Optimization for GUI Agents

December 1, 2025
作者: Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui Shao
cs.AI

摘要

图形用户界面(GUI)智能体需要有效利用历史上下文以执行序列化导航任务。虽然引入过往动作与观察能提升决策质量,但直接使用完整历史记录会导致计算开销过大并被无关信息干扰。为此,我们提出HiconAgent——一种通过历史上下文感知策略优化(HCPO)训练的GUI智能体,可实现历史信息的高效利用。HCPO通过两个互补组件优化历史信息在采样和策略更新中的使用:(1)动态上下文采样(DCS)在采样阶段为智能体提供可变长度历史记录,使其能自适应选择最相关上下文;(2)锚点引导的历史压缩(AHC)在策略更新阶段采用双分支策略,其中压缩分支在保留历史动作作为信息流锚点的同时去除历史观察。通过历史增强对齐损失耦合压缩与非压缩分支,在保证效率的同时强化历史使用的一致性。主流GUI导航基准测试表明,尽管模型规模更小,HiconAgent-3B在GUI-Odyssey数据集上的定位准确率与步骤成功率分别超越GUI-R1-7B达8.46%和11.32%,在AndroidControl和AITW数据集上达到相当效果的同时实现最高2.47倍计算加速与60%浮点运算量削减。
English
Graphical User Interface (GUI) agents require effective use of historical context to perform sequential navigation tasks. While incorporating past actions and observations can improve decision making, naive use of full history leads to excessive computational overhead and distraction from irrelevant information. To address this, we introduce HiconAgent, a GUI agent trained with History Context-aware Policy Optimization (HCPO) for efficient and effective utilization of historical information. HCPO optimizes history usage in both sampling and policy updates through two complementary components: (1) Dynamic Context Sampling (DCS) presents the agent with variable length histories during sampling, enabling adaptive use of the most relevant context; (2) Anchor-guided History Compression (AHC) refines the policy update phase with a dual branch strategy where the compressed branch removes history observations while keeping history actions as information flow anchors. The compressed and uncompressed branches are coupled through a history-enhanced alignment loss to enforce consistent history usage while maintaining efficiency. Experiments on mainstream GUI navigation benchmarks demonstrate strong performance. Despite being smaller, HiconAgent-3B outperforms GUI-R1-7B by +8.46 percent grounding accuracy and +11.32 percent step success rate on GUI-Odyssey, while achieving comparable results on AndroidControl and AITW with up to 2.47x computational speedup and 60 percent FLOPs reduction.
PDF31December 3, 2025