從電腦使用行為建立通用用戶模型
Creating General User Models from Computer Use
May 16, 2025
作者: Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, Michael S. Bernstein
cs.AI
摘要
人機互動領域長期以來一直設想技術能夠理解我們——從我們的偏好和習慣,到日常行為的時機和目的。然而,當前的用戶模型仍然支離破碎,僅針對特定應用程式量身定制,無法實現實現這些願景所需的靈活推理。本文提出了一種通用用戶模型(GUM)的架構,該模型通過觀察用戶與計算機的任何互動來了解用戶。GUM 以用戶的任何非結構化觀察(例如,設備截圖)作為輸入,並構建捕捉該用戶知識和偏好的置信度加權命題。GUM 可以從與朋友的訊息中推斷出用戶正在為他們參加的婚禮做準備。或者通過觀察多次停滯的編輯和轉向閱讀相關工作,識別出用戶正在努力應對合作者對草稿的反饋。GUM 引入了一種架構,該架構從多模態觀察中推斷出關於用戶的新命題,檢索相關命題以獲取上下文,並不斷修訂現有命題。為了展示 GUM 所支持的廣泛應用,我們展示了它們如何通過上下文增強基於聊天的助手,管理操作系統通知以選擇性地呈現重要信息,並啟用能夠適應跨應用程式偏好的互動代理。我們還實例化了主動助手(GUMBO),這些助手利用其 GUM 發現並代表用戶執行有用的建議。在我們的評估中,我們發現 GUM 對用戶做出了校準且準確的推斷,並且基於 GUM 構建的助手能夠主動識別並執行用戶不會明確請求的操作。總而言之,GUM 引入了利用多模態模型來理解非結構化上下文的方法,實現了人機互動的長期願景,並催生了能夠預測用戶需求的完全新型互動系統。
English
Human-computer interaction has long imagined technology that understands
us-from our preferences and habits, to the timing and purpose of our everyday
actions. Yet current user models remain fragmented, narrowly tailored to
specific apps, and incapable of the flexible reasoning required to fulfill
these visions. This paper presents an architecture for a general user model
(GUM) that learns about you by observing any interaction you have with your
computer. The GUM takes as input any unstructured observation of a user (e.g.,
device screenshots) and constructs confidence-weighted propositions that
capture that user knowledge and preferences. GUMs can infer that a user is
preparing for a wedding they're attending from messages with a friend. Or
recognize that a user is struggling with a collaborator's feedback on a draft
by observing multiple stalled edits and a switch to reading related work. GUMs
introduce an architecture that infers new propositions about a user from
multimodal observations, retrieves related propositions for context, and
continuously revises existing propositions. To illustrate the breadth of
applications that GUMs enable, we demonstrate how they augment chat-based
assistants with context, manage OS notifications to selectively surface
important information, and enable interactive agents that adapt to preferences
across apps. We also instantiate proactive assistants (GUMBOs) that discover
and execute useful suggestions on a user's behalf using their GUM. In our
evaluations, we find that GUMs make calibrated and accurate inferences about
users, and that assistants built on GUMs proactively identify and perform
actions that users wouldn't think to request explicitly. Altogether, GUMs
introduce methods that leverage multimodal models to understand unstructured
context, enabling long-standing visions of HCI and entirely new interactive
systems that anticipate user needs.Summary
AI-Generated Summary