从计算机使用中构建通用用户模型
Creating General User Models from Computer Use
May 16, 2025
作者: Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, Michael S. Bernstein
cs.AI
摘要
人机交互领域长期以来一直憧憬着能够理解我们的技术——从我们的偏好和习惯,到日常行为的时间安排与目的。然而,当前的用户模型仍显碎片化,局限于特定应用程序,缺乏实现这些愿景所需的灵活推理能力。本文提出了一种通用用户模型(GUM)的架构,该模型通过观察用户与计算机的任何互动来学习用户信息。GUM以用户的无结构化观察数据(如设备截图)为输入,构建出捕捉用户知识与偏好的置信度加权命题。例如,GUM能从用户与朋友的交流中推断出用户正在筹备参加的婚礼,或通过观察到多次编辑停滞及转向阅读相关文献,识别出用户正因合作者对草稿的反馈而困扰。GUM引入了一种架构,能够从多模态观察中推断出关于用户的新命题,检索相关命题以提供上下文,并持续修订现有命题。为了展示GUM支持的广泛应用,我们演示了其如何为基于聊天的助手增添上下文,管理操作系统通知以选择性呈现重要信息,以及实现跨应用偏好自适应的交互式代理。我们还实例化了主动助手(GUMBOs),它们利用GUM发现并代表用户执行有益建议。在评估中,我们发现GUM能对用户做出校准准确且精确的推断,基于GUM构建的助手能主动识别并执行用户未曾明确请求的操作。总而言之,GUM引入的方法利用多模态模型理解无结构化上下文,不仅实现了人机交互的长期愿景,还催生了能够预见用户需求的崭新交互系统。
English
Human-computer interaction has long imagined technology that understands
us-from our preferences and habits, to the timing and purpose of our everyday
actions. Yet current user models remain fragmented, narrowly tailored to
specific apps, and incapable of the flexible reasoning required to fulfill
these visions. This paper presents an architecture for a general user model
(GUM) that learns about you by observing any interaction you have with your
computer. The GUM takes as input any unstructured observation of a user (e.g.,
device screenshots) and constructs confidence-weighted propositions that
capture that user knowledge and preferences. GUMs can infer that a user is
preparing for a wedding they're attending from messages with a friend. Or
recognize that a user is struggling with a collaborator's feedback on a draft
by observing multiple stalled edits and a switch to reading related work. GUMs
introduce an architecture that infers new propositions about a user from
multimodal observations, retrieves related propositions for context, and
continuously revises existing propositions. To illustrate the breadth of
applications that GUMs enable, we demonstrate how they augment chat-based
assistants with context, manage OS notifications to selectively surface
important information, and enable interactive agents that adapt to preferences
across apps. We also instantiate proactive assistants (GUMBOs) that discover
and execute useful suggestions on a user's behalf using their GUM. In our
evaluations, we find that GUMs make calibrated and accurate inferences about
users, and that assistants built on GUMs proactively identify and perform
actions that users wouldn't think to request explicitly. Altogether, GUMs
introduce methods that leverage multimodal models to understand unstructured
context, enabling long-standing visions of HCI and entirely new interactive
systems that anticipate user needs.Summary
AI-Generated Summary