ChatPaper.aiChatPaper

PIRA-Bench:从反应式GUI智能体向基于GUI的主动意图推荐智能体的演进

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

March 9, 2026
作者: Yuxiang Chai, Shunye Tang, Han Xiao, Rui Liu, Hongsheng Li
cs.AI

摘要

当前图形用户界面(GUI)智能体主要运行于被动响应模式:用户必须提供明确指令才能驱动智能体执行任务。然而,真正智能的AI助手应当具备主动性,能够直接从持续视觉输入(如移动端或桌面端屏幕截图)中预测用户意图,并在无明确提示时提供适时建议。向这种主动式范式的转型面临重大挑战:现实屏幕活动很少呈线性发展,而是由充满噪声浏览、无意义操作和多线程任务切换的长周期轨迹构成。为弥补这一鸿沟,我们推出PIRA-Bench(主动意图推荐智能体基准),这是一个基于连续弱监督视觉输入评估多模态大语言模型(MLLMs)的新型基准。与被动响应数据集不同,PIRA-Bench包含具有多重交织意图的复杂轨迹、带有噪声的片段以及多样化的用户画像上下文,要求智能体在适应用户偏好的同时检测可操作事件。此外,我们提出PIRF基线框架——一种具备记忆感知能力的状态追踪框架,可使通用MLLMs管理多任务线程并处理具有误导性的视觉输入。PIRA-Bench为构建鲁棒的主动式GUI个人助手迈出了重要一步。
English
Current Graphical User Interface (GUI) agents operate primarily under a reactive paradigm: a user must provide an explicit instruction for the agent to execute a task. However, an intelligent AI assistant should be proactive, which is capable of anticipating user intentions directly from continuous visual inputs, such as mobile or desktop screenshots, and offering timely recommendations without explicit user prompting. Transitioning to this proactive paradigm presents significant challenges. Real-world screen activity is rarely linear; it consists of long-horizon trajectories fraught with noisy browsing, meaningless actions, and multithreaded task-switching. To address this gap, we introduce PIRA-Bench (Proactive Intent Recommendation Agent Benchmark), a novel benchmark for evaluating multimodal large language models (MLLMs) on continuous, weakly-supervised visual inputs. Unlike reactive datasets, PIRA-Bench features complex trajectories with multiple interleaved intents and noisy segments with various user profile contexts, challenging agents to detect actionable events while fitting to user preferences. Furthermore, we propose the PIRF baseline, a memory-aware, state-tracking framework that empowers general MLLMs to manage multiple task threads and handle misleading visual inputs. PIRA-Bench serves as an initial step toward robust and proactive GUI-based personal assistants.
PDF142March 17, 2026