사전 대응형 에이전트 연구 환경: 사전 대응형 어시스턴트 평가를 위한 능동적 사용자 시뮬레이션

초록

사용자 요구를 예측하고 자율적으로 작업을 실행하는 능동형 에이전트는 디지털 어시스턴트로서 큰 잠재력을 지니지만, 현실적인 사용자 시뮬레이션 프레임워크의 부재로 개발이 지연되고 있습니다. 기존 접근법은 앱을 평판형 도구 호출 API로 모델링하여 디지털 환경에서의 상태 기반 순차적 사용자 상호작용을 제대로 반영하지 못하고, 현실적인 사용자 시뮬레이션을 어렵게 만듭니다. 본 연구에서는 디지털 환경에서 능동형 에이전트를 구축하고 평가하기 위한 프레임워크인 Proactive Agent Research Environment(Pare)를 소개합니다. Pare는 애플리케이션을 유한 상태 기계로 모델링하며, 사용자 시뮬레이터를 위해 상태 기반 탐색 및 상태 종속 액션 공간을 제공하여 능동적 사용자 시뮬레이션을 가능하게 합니다. 이를 기반으로 의사소통, 생산성, 일정 관리, 생활習慣 앱에 걸친 143개 다양한 작업으로 구성된 벤치마크 Pare-Bench을 제시합니다. 이 벤치마크는 맥락 관찰, 목표 추론, 개입 시기 판단, 다중 앱 조정 능력을 평가하도록 설계되었습니다.

English

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

사전 대응형 에이전트 연구 환경: 사전 대응형 어시스턴트 평가를 위한 능동적 사용자 시뮬레이션

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

초록

Support