コンピュータ利用のための効率的なエージェント訓練

要旨

高品質な軌跡データのスケールアップは、人間のようなコンピュータ利用エージェントの開発において長らく重要なボトルネックとなってきました。本論文では、大規模な人間によるデモンストレーションへの依存を大幅に削減する効率的なエージェントトレーニングフレームワーク「PC Agent-E」を紹介します。わずか312の人間による注釈付きコンピュータ利用軌跡から始め、Claude 3.7 Sonnetを用いて多様な行動決定を合成することで、データ品質をさらに向上させました。これらの強化された軌跡データでトレーニングされたPC Agent-Eモデルは、141%という顕著な相対的改善を達成し、私たちがリリースした改良版ベンチマーク「WindowsAgentArena-V2」において、拡張思考を備えた強力なClaude 3.7 Sonnetを上回りました。さらに、PC Agent-EはOSWorldにおける異なるオペレーティングシステムへの強い汎化能力を示しています。本研究の結果は、少量の高品質な軌跡データから強力なコンピュータ利用能力を引き出せる可能性を示唆しています。

English

Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further improved data quality by synthesizing diverse action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, surpassing the strong Claude 3.7 Sonnet with extended thinking on WindowsAgentArena-V2, an improved benchmark we also released. Furthermore, PC Agent-E demonstrates strong generalizability to different operating systems on OSWorld. Our findings suggest that strong computer use capabilities can be stimulated from a small amount of high-quality trajectory data.

コンピュータ利用のための効率的なエージェント訓練

Efficient Agent Training for Computer Use

要旨

Support