STEVE: コンピュータ利用エージェントトレーニングのための段階的検証パイプライン

要旨

グラフィカルユーザーインターフェースを自律的に操作するAIエージェントの開発は、長年にわたる挑戦的な課題である。データスケーリング則の最近の進展は、スケーリングされた指示セットを用いてコンピュータ使用エージェントを訓練することを可能にしたが、行動クローニングを用いたエージェントの訓練には依然として膨大な高品質な軌跡データが必要である。スケーラビリティのニーズに対応するため、我々はコンピュータ使用エージェント訓練のためのステップ検証パイプラインであるSTEVEを設計した。まず、コンピュータ使用エージェントのための大規模な指示セットを確立し、いくつかの準最適なエージェントを用いて軌跡データを収集する。GPT-4oを用いて、アクション実行前後の画面に基づいて軌跡内の各ステップの正しさを検証し、各ステップに二値ラベルを付与する。最後に、カーネマンとトベルスキーの最適化を採用して、二値のステップワイズラベルからエージェントを最適化する。大規模な実験により、我々のエージェントは軌跡内の正と負の両方のアクションを活用することで、教師ありファインチューニングを上回る性能を発揮することが示された。また、STEVEを用いることで、7Bの視覚言語モデルをコンピュータ使用エージェントとして訓練し、挑戦的なライブデスクトップ環境であるWinAgentArenaにおいて、効率的かつ低コストでリーディングな性能を達成した。コードとデータ: https://github.com/FanbinLu/STEVE。

English

Developing AI agents to autonomously manipulate graphical user interfaces is a long challenging task. Recent advances in data scaling law inspire us to train computer-use agents with a scaled instruction set, yet using behavior cloning to train agents still requires immense high-quality trajectories. To meet the scalability need, we designed STEVE, a step verification pipeline for computer-use agent training. First, we establish a large instruction set for computer-use agents and collect trajectory data with some suboptimal agents. GPT-4o is used to verify the correctness of each step in the trajectories based on the screens before and after the action execution, assigning each step with a binary label. Last, we adopt the Kahneman and Tversky Optimization to optimize the agent from the binary stepwise labels. Extensive experiments manifest that our agent outperforms supervised finetuning by leveraging both positive and negative actions within a trajectory. Also, STEVE enables us to train a 7B vision-language model as a computer-use agent, achieving leading performance in the challenging live desktop environment WinAgentArena with great efficiency at a reduced cost. Code and data: https://github.com/FanbinLu/STEVE.

STEVE: コンピュータ利用エージェントトレーニングのための段階的検証パイプライン

STEVE: AStep Verification Pipeline for Computer-use Agent Training

要旨

Support