Proactive Agent Research Environment: 能動的ユーザーをシミュレートしてProactive Assistantを評価する環境

要旨

ユーザーの要求を予測し自律的にタスクを実行するプロアクティブエージェントはデジタルアシスタントとして大きな可能性を秘めるが、現実的なユーザーシミュレーション環境の不足が開発を阻害している。既存手法はアプリを平坦なツール呼び出しAPIとしてモデル化するため、デジタル環境におけるユーザーインタラクションの状態保持性や連続性を捉えられず、現実的なユーザーシミュレーションを実現できない。本論文では、デジタル環境におけるプロアクティブエージェントの構築と評価のためのフレームワークProactive Agent Research Environment（Pare）を提案する。Pareはアプリケーションを有限状態機械としてモデル化し、ユーザーシミュレータに対して状態保持型ナビゲーションと状態依存型アクション空間を提供することで、能動的なユーザーシミュレーションを可能にする。この基盤に立脚し、コミュニケーション・生産性・スケジューリング・ライフスタイルアプリに跨る143の多様なタスクから構成されるベンチマークPare-Benchを構築した。本ベンチマークは、文脈観察・目標推論・介入タイミング・マルチアプリ調整といった能力を評価するように設計されている。

English

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

Proactive Agent Research Environment: 能動的ユーザーをシミュレートしてProactive Assistantを評価する環境

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

要旨

Support