主动式智能体研究环境：通过模拟活跃用户评估主动助手的效能

摘要

能够预测用户需求并自主执行任务的主动型智能体作为数字助手前景广阔，但缺乏真实的用户模拟框架制约了其发展。现有方法将应用程序建模为扁平化的工具调用API，既无法捕捉数字环境中用户交互的状态性与序列性特征，也导致真实用户模拟难以实现。我们推出主动智能体研究环境Pare，该框架可在数字环境中构建和评估主动型智能体。Pare将应用程序建模为具有状态导航功能和状态依赖型动作空间的有限状态机，使主动用户模拟成为可能。基于此，我们进一步提出Pare-Bench基准测试集，涵盖通信、生产力、日程管理和生活类应用的143项多样化任务，旨在检验智能体的情境观察、目标推断、干预时机把握及多应用协调能力。

English

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

主动式智能体研究环境：通过模拟活跃用户评估主动助手的效能

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

摘要

Support