Morae：主動暫停UI代理以等待用戶選擇

摘要

使用者介面（UI）代理承諾讓視障及低視力（BLV）使用者更容易接觸到原本難以接近或複雜的UI。然而，現有的UI代理通常以端到端的方式執行任務，未讓使用者參與關鍵選擇或告知重要情境資訊，從而降低了使用者的主動性。例如，在我們的實地研究中，一位BLV參與者要求購買最便宜的氣泡水，代理自動從多個價格相同的選項中選擇了一款，卻未提及不同口味或評分更好的替代產品。為解決此問題，我們引入了Morae，這是一款能在任務執行過程中自動識別決策點並暫停，以便使用者做出選擇的UI代理。Morae利用大型多模態模型來解析使用者查詢、UI代碼及螢幕截圖，並在需要做出選擇時提示使用者澄清。在一項針對BLV參與者進行的真實網路任務研究中，與包括OpenAI Operator在內的基準代理相比，Morae幫助使用者完成了更多任務，並選擇了更符合其偏好的選項。更廣泛而言，這項工作展示了一種混合主動性的方法，讓使用者既能受益於UI代理的自動化，又能表達自己的偏好。

English

User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.

Morae：主動暫停UI代理以等待用戶選擇

Morae: Proactively Pausing UI Agents for User Choices

摘要

Support