STEVE: 컴퓨터 사용 에이전트 훈련을 위한 단계별 검증 파이프라인

초록

그래픽 사용자 인터페이스를 자율적으로 조작하는 AI 에이전트를 개발하는 것은 오랜 시간 동안 도전적인 과제로 여겨져 왔습니다. 최근 데이터 스케일링 법칙의 발전은 확장된 명령어 세트를 사용하여 컴퓨터 사용 에이전트를 훈련시킬 수 있는 가능성을 제시하지만, 행동 복제를 통해 에이전트를 훈련시키기 위해서는 여전히 방대한 양의 고품질 궤적 데이터가 필요합니다. 이러한 확장성 요구를 충족시키기 위해, 우리는 컴퓨터 사용 에이전트 훈련을 위한 단계 검증 파이프라인인 STEVE를 설계했습니다. 먼저, 컴퓨터 사용 에이전트를 위한 대규모 명령어 세트를 구축하고 일부 최적화되지 않은 에이전트로부터 궤적 데이터를 수집합니다. GPT-4o를 사용하여 각 궤적의 단계별 정확성을 검증하며, 이는 액션 실행 전후의 화면을 기반으로 각 단계에 이진 레이블을 할당합니다. 마지막으로, Kahneman과 Tversky 최적화를 도입하여 이진 단계별 레이블을 기반으로 에이전트를 최적화합니다. 광범위한 실험을 통해 우리의 에이전트가 궤적 내의 긍정적 및 부정적 액션을 모두 활용하여 지도 미세조정을 능가하는 성능을 보임을 입증했습니다. 또한, STEVE는 7B 규모의 시각-언어 모델을 컴퓨터 사용 에이전트로 훈련시켜, 도전적인 실시간 데스크톱 환경인 WinAgentArena에서 선도적인 성능을 달성하면서도 비용을 절감하고 효율성을 크게 향상시켰습니다. 코드와 데이터는 https://github.com/FanbinLu/STEVE에서 확인할 수 있습니다.

English

Developing AI agents to autonomously manipulate graphical user interfaces is a long challenging task. Recent advances in data scaling law inspire us to train computer-use agents with a scaled instruction set, yet using behavior cloning to train agents still requires immense high-quality trajectories. To meet the scalability need, we designed STEVE, a step verification pipeline for computer-use agent training. First, we establish a large instruction set for computer-use agents and collect trajectory data with some suboptimal agents. GPT-4o is used to verify the correctness of each step in the trajectories based on the screens before and after the action execution, assigning each step with a binary label. Last, we adopt the Kahneman and Tversky Optimization to optimize the agent from the binary stepwise labels. Extensive experiments manifest that our agent outperforms supervised finetuning by leveraging both positive and negative actions within a trajectory. Also, STEVE enables us to train a 7B vision-language model as a computer-use agent, achieving leading performance in the challenging live desktop environment WinAgentArena with great efficiency at a reduced cost. Code and data: https://github.com/FanbinLu/STEVE.

STEVE: 컴퓨터 사용 에이전트 훈련을 위한 단계별 검증 파이프라인

STEVE: AStep Verification Pipeline for Computer-use Agent Training

초록

Support