从计算机使用中构建通用用户模型

摘要

人机交互领域长期以来一直憧憬着能够理解我们的技术——从我们的偏好和习惯，到日常行为的时间安排与目的。然而，当前的用户模型仍显碎片化，局限于特定应用程序，缺乏实现这些愿景所需的灵活推理能力。本文提出了一种通用用户模型（GUM）的架构，该模型通过观察用户与计算机的任何互动来学习用户信息。GUM以用户的无结构化观察数据（如设备截图）为输入，构建出捕捉用户知识与偏好的置信度加权命题。例如，GUM能从用户与朋友的交流中推断出用户正在筹备参加的婚礼，或通过观察到多次编辑停滞及转向阅读相关文献，识别出用户正因合作者对草稿的反馈而困扰。GUM引入了一种架构，能够从多模态观察中推断出关于用户的新命题，检索相关命题以提供上下文，并持续修订现有命题。为了展示GUM支持的广泛应用，我们演示了其如何为基于聊天的助手增添上下文，管理操作系统通知以选择性呈现重要信息，以及实现跨应用偏好自适应的交互式代理。我们还实例化了主动助手（GUMBOs），它们利用GUM发现并代表用户执行有益建议。在评估中，我们发现GUM能对用户做出校准准确且精确的推断，基于GUM构建的助手能主动识别并执行用户未曾明确请求的操作。总而言之，GUM引入的方法利用多模态模型理解无结构化上下文，不仅实现了人机交互的长期愿景，还催生了能够预见用户需求的崭新交互系统。

English

Human-computer interaction has long imagined technology that understands us-from our preferences and habits, to the timing and purpose of our everyday actions. Yet current user models remain fragmented, narrowly tailored to specific apps, and incapable of the flexible reasoning required to fulfill these visions. This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer. The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture that user knowledge and preferences. GUMs can infer that a user is preparing for a wedding they're attending from messages with a friend. Or recognize that a user is struggling with a collaborator's feedback on a draft by observing multiple stalled edits and a switch to reading related work. GUMs introduce an architecture that infers new propositions about a user from multimodal observations, retrieves related propositions for context, and continuously revises existing propositions. To illustrate the breadth of applications that GUMs enable, we demonstrate how they augment chat-based assistants with context, manage OS notifications to selectively surface important information, and enable interactive agents that adapt to preferences across apps. We also instantiate proactive assistants (GUMBOs) that discover and execute useful suggestions on a user's behalf using their GUM. In our evaluations, we find that GUMs make calibrated and accurate inferences about users, and that assistants built on GUMs proactively identify and perform actions that users wouldn't think to request explicitly. Altogether, GUMs introduce methods that leverage multimodal models to understand unstructured context, enabling long-standing visions of HCI and entirely new interactive systems that anticipate user needs.

从计算机使用中构建通用用户模型

Creating General User Models from Computer Use

摘要

Support