主动式智能体研究环境：通过模拟活跃用户评估主动型助手的效能

摘要

能够预测用户需求并自主执行任务的主动式智能体作为数字助手前景广阔，但缺乏真实的用户模拟框架制约了其发展。现有方法将应用程序建模为扁平化的工具调用API，既无法捕捉数字环境中用户交互的状态性与连续性特征，也导致真实用户模拟难以实现。我们推出主动智能体研究环境Pare，该框架支持在数字环境中构建与评估主动式智能体。Pare将应用程序建模为具备状态导航功能和状态依赖型操作空间的有限状态机，从而实现主动式用户模拟。基于此，我们提出包含143项跨通信、生产力、日程管理和生活类应用任务的基准测试Pare-Bench，旨在检验智能体的情境观察、目标推断、干预时机判断及多应用协同能力。

English

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.