컴퓨터 사용 데이터를 통한 일반 사용자 모델 생성

초록

인간-컴퓨터 상호작용은 오랫동안 우리의 선호도와 습관부터 일상 행동의 시기와 목적까지 이해하는 기술을 상상해왔습니다. 그러나 현재의 사용자 모델은 여전히 단편적이며, 특정 애플리케이션에 맞춰져 있고, 이러한 비전을 실현하기 위해 필요한 유연한 추론 능력이 부족합니다. 이 논문은 컴퓨터와의 모든 상호작용을 관찰함으로써 사용자에 대해 학습하는 일반 사용자 모델(GUM)의 아키텍처를 제시합니다. GUM은 사용자의 비정형 관찰 데이터(예: 디바이스 스크린샷)를 입력으로 받아 해당 사용자의 지식과 선호도를 포착하는 신뢰도 가중치 명제를 구성합니다. GUM은 사용자가 친구와 주고받은 메시지를 통해 결혼식을 준비하고 있다는 것을 추론할 수 있습니다. 또는 사용자가 협력자의 피드백으로 인해 초안 수정이 멈추고 관련 자료를 읽는 것으로 전환하는 것을 관찰함으로써 어려움을 겪고 있다는 것을 인식할 수 있습니다. GUM은 다중 모드 관찰로부터 사용자에 대한 새로운 명제를 추론하고, 관련 명제를 검색하여 문맥을 제공하며, 기존 명제를 지속적으로 수정하는 아키텍처를 도입합니다. GUM이 가능하게 하는 다양한 응용 프로그램을 설명하기 위해, 우리는 GUM이 채팅 기반 어시스턴트에 문맥을 추가하고, OS 알림을 관리하여 중요한 정보를 선택적으로 표시하며, 앱 간 선호도에 적응하는 인터랙티브 에이전트를 가능하게 하는 방법을 보여줍니다. 또한, 우리는 GUM을 사용하여 유용한 제안을 발견하고 사용자를 대신해 실행하는 사전 예방적 어시스턴트(GUMBO)를 구현합니다. 평가 결과, GUM은 사용자에 대해 보정된 정확한 추론을 수행하며, GUM을 기반으로 구축된 어시스턴트는 사용자가 명시적으로 요청하지 않을 행동을 사전에 식별하고 수행하는 것으로 나타났습니다. 전반적으로, GUM은 다중 모드 모델을 활용하여 비정형 문맥을 이해하는 방법을 도입함으로써, 오랜 HCI 비전과 사용자 요구를 예측하는 완전히 새로운 인터랙티브 시스템을 가능하게 합니다.

English

Human-computer interaction has long imagined technology that understands us-from our preferences and habits, to the timing and purpose of our everyday actions. Yet current user models remain fragmented, narrowly tailored to specific apps, and incapable of the flexible reasoning required to fulfill these visions. This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer. The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture that user knowledge and preferences. GUMs can infer that a user is preparing for a wedding they're attending from messages with a friend. Or recognize that a user is struggling with a collaborator's feedback on a draft by observing multiple stalled edits and a switch to reading related work. GUMs introduce an architecture that infers new propositions about a user from multimodal observations, retrieves related propositions for context, and continuously revises existing propositions. To illustrate the breadth of applications that GUMs enable, we demonstrate how they augment chat-based assistants with context, manage OS notifications to selectively surface important information, and enable interactive agents that adapt to preferences across apps. We also instantiate proactive assistants (GUMBOs) that discover and execute useful suggestions on a user's behalf using their GUM. In our evaluations, we find that GUMs make calibrated and accurate inferences about users, and that assistants built on GUMs proactively identify and perform actions that users wouldn't think to request explicitly. Altogether, GUMs introduce methods that leverage multimodal models to understand unstructured context, enabling long-standing visions of HCI and entirely new interactive systems that anticipate user needs.

컴퓨터 사용 데이터를 통한 일반 사용자 모델 생성

Creating General User Models from Computer Use

초록

Support