ChatPaper.aiChatPaper

Morae:主动暂停UI代理以支持用户选择

Morae: Proactively Pausing UI Agents for User Choices

August 29, 2025
作者: Yi-Hao Peng, Dingzeyu Li, Jeffrey P. Bigham, Amy Pavel
cs.AI

摘要

用户界面(UI)代理有望为盲人和低视力(BLV)用户简化难以访问或复杂的UI操作。然而,当前的UI代理通常以端到端的方式执行任务,未让用户参与关键决策或使其了解重要的上下文信息,从而削弱了用户的自主性。例如,在我们的实地研究中,一位BLV参与者要求购买最便宜的起泡水,代理自动从多个价格相同的选项中挑选了一款,却未提及其他口味不同或评分更高的替代产品。为解决这一问题,我们推出了Morae,这是一种UI代理,它能在任务执行过程中自动识别决策点并暂停,以便用户做出选择。Morae利用大型多模态模型解析用户查询、UI代码及屏幕截图,并在需要做出选择时提示用户进行澄清。在一项针对BLV参与者进行的真实网页任务研究中,与包括OpenAI Operator在内的基线代理相比,Morae帮助用户完成了更多任务,并选择了更符合其偏好的选项。更广泛而言,这项工作展示了一种混合主动性的方法,用户既能享受UI代理的自动化便利,又能表达个人偏好。
English
User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.
PDF52September 1, 2025