PersonalAlign：基于长期用户中心记录的分层隐式意图对齐个性化图形用户界面代理

摘要

尽管图形用户界面智能体在明确指令和任务完成场景下表现出色，但实际应用需要其与用户更复杂的隐性意图保持对齐。本研究提出个性化GUI智能体的分层隐性意图对齐框架（PersonalAlign），该新型智能体任务要求智能体利用长期用户记录作为持久上下文，解析模糊指令中被省略的偏好，并根据用户状态预判潜在操作习惯以提供主动协助。为推进该研究，我们构建了AndroidIntent基准测试平台，通过长期用户记录推理评估智能体解析模糊指令和提供主动建议的能力。我们从20万条跨用户长期记录中标注了775项用户特定偏好和215种操作习惯用于评估。此外，我们提出分层意图记忆智能体（HIM-Agent），该架构通过持续更新的个人记忆库分层管理用户偏好与操作习惯以实现个性化。最终我们在AndroidIntent上评估了包括GPT-5、Qwen3-VL和UI-TARS在内的多类GUI智能体，实验表明HIM-Agent将任务执行准确率和主动服务成功率分别显著提升15.7%和7.3%。

English

While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task that requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions and anticipate latent routines by user state for proactive assistance. To facilitate this study, we introduce AndroidIntent, a benchmark designed to evaluate agents' ability in resolving vague instructions and providing proactive suggestions through reasoning over long-term user records. We annotated 775 user-specific preferences and 215 routines from 20k long-term records across different users for evaluation. Furthermore, we introduce Hierarchical Intent Memory Agent (HIM-Agent), which maintains a continuously updating personal memory and hierarchically organizes user preferences and routines for personalization. Finally, we evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS, further results show that HIM-Agent significantly improves both execution and proactive performance by 15.7% and 7.3%.