AgentSynth: 범용 컴퓨터 사용 에이전트를 위한 확장 가능한 작업 생성

초록

우리는 범용 컴퓨터 사용 에이전트를 위한 고품질 작업 및 궤적 데이터셋을 자동으로 합성하기 위한 확장 가능하고 비용 효율적인 파이프라인인 AgentSynth를 소개한다. 정보 비대칭성을 활용하여, AgentSynth는 생성 시에는 단순하지만 장기적 작업으로 구성될 때 상당히 더 도전적인 하위 작업을 구성함으로써 6,000개 이상의 다양하고 현실적인 작업을 생성할 수 있다. 우리의 파이프라인은 페르소나에 의해 안내되는 LLM 기반 작업 제안자로 시작하며, 작업을 완료하고 궤적을 기록하는 실행 에이전트가 뒤따른다. 이 과정은 반복적으로 수행되어 일련의 하위 작업을 형성하며, 이는 별도의 에이전트에 의해 통합되어 난이도를 조절 가능한 복합 작업으로 요약된다. AgentSynth의 주요 강점은 하위 작업의 수를 조정하여 작업 복잡성을 정밀하게 조절할 수 있는 능력이다. 실험적 평가 결과, 최첨단 LLM 에이전트는 난이도 1에서 18%의 성공률을 보이다가 난이도 6에서는 단 4%로 급격히 성능이 하락하는 것으로 나타나, 벤치마크의 난이도와 판별력을 강조한다. 또한, 우리의 파이프라인은 궤적당 평균 \$0.60의 낮은 비용을 달성하여, 인간 주석에 비해 수십 배 더 저렴하다. 우리의 코드와 데이터는 https://github.com/sunblaze-ucb/AgentSynth에서 공개적으로 이용 가능하다.

English

We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. Leveraging information asymmetry, AgentSynth constructs subtasks that are simple during generation but significantly more challenging when composed into long-horizon tasks, enabling the creation of over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based task proposer guided by a persona, followed by an execution agent that completes the task and logs the trajectory. This process is repeated iteratively to form a sequence of subtasks, which are then summarized by a separate agent into a composite task of controllable difficulty. A key strength of AgentSynth is its ability to precisely modulate task complexity by varying the number of subtasks. Empirical evaluations show that state-of-the-art LLM agents suffer a steep performance drop, from 18% success at difficulty level 1 to just 4% at level 6, highlighting the benchmark's difficulty and discriminative power. Moreover, our pipeline achieves a low average cost of \$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our code and data are publicly available at https://github.com/sunblaze-ucb/AgentSynth

AgentSynth: 범용 컴퓨터 사용 에이전트를 위한 확장 가능한 작업 생성

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

초록

Support