상호작용을 통한 학습: 현실적 환경에서의 자기적응 에이전트를 위한 데이터 중심 프레임워크

초록

대규모 언어 모델 (LLM)에 의해 구동되는 자율 에이전트는 이메일 보내기부터 데이터 분석 수행까지 디지털 작업을 돕는 데 있어 인간의 능력을 향상시킬 수 있는 잠재력을 가지고 있습니다. 해당 작업에 대한 기존 LLM의 능력은 종종 상호 작용하는 환경으로부터의 고품질 에이전트 데이터 부족으로 제약을 받습니다. 우리는 인간 주석 없이 주어진 환경에 LLM 에이전트를 적응시키기 위한 데이터 중심 프레임워크인 상호작용 학습을 제안합니다. 상호작용 학습은 문서를 기반으로 에이전트-환경 상호작용의 궤적을 합성하고 상호작용 기록을 요약하거나 추상화하여 지시사항을 작성하는 역방향 구성이라는 프로세스를 통해 이루어집니다. 우리는 합성 데이터의 품질을 평가하기 위해 이를 훈련 기반 시나리오와 훈련 없이 콘텍스트 학습(ICL)에서 사용하여 에이전트에 최적화된 혁신적인 검색 접근 방식을 개발합니다. 현실적인 코딩, 웹, 데스크톱 환경을 포괄하는 SWE-bench, WebArena, OSWorld 및 Spider2-V에서의 광범위한 실험은 상호작용 학습이 다양한 하위 에이전트 작업에서 효과적임을 보여줍니다. Claude-3.5의 ICL에서 최대 12.2\%, Codestral-22B의 훈련에서 최대 19.5\%의 기준 결과가 향상됩니다. 또한 훈련에 대한 최대 14.0\%의 향상을 제공하는 역방향 구성의 중요성을 더욱 입증합니다. 우리의 합성 데이터가 ICL에서 제공하는 효율성과 전통적인 검색 보강 생성 (RAG)과 같은 대안적 접근 방식에 비해 우리의 검색 파이프라인의 우월성을 입증하는 연구를 통해 우리는 상호작용 학습이 LLM이 현실 환경에서 점점 더 배치됨에 따라 에이전트 데이터 합성의 기초로 작용할 것으로 기대합니다.

English

Autonomous agents powered by large language models (LLMs) have the potential to enhance human capabilities, assisting with digital tasks from sending emails to performing data analysis. The abilities of existing LLMs at such tasks are often hindered by the lack of high-quality agent data from the corresponding environments they interact with. We propose Learn-by-interact, a data-centric framework to adapt LLM agents to any given environments without human annotations. Learn-by-interact synthesizes trajectories of agent-environment interactions based on documentations, and constructs instructions by summarizing or abstracting the interaction histories, a process called backward construction. We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL), where we craft innovative retrieval approaches optimized for agents. Extensive experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact in various downstream agentic tasks -- baseline results are improved by up to 12.2\% for ICL with Claude-3.5 and 19.5\% for training with Codestral-22B. We further demonstrate the critical role of backward construction, which provides up to 14.0\% improvement for training. Our ablation studies demonstrate the efficiency provided by our synthesized data in ICL and the superiority of our retrieval pipeline over alternative approaches like conventional retrieval-augmented generation (RAG). We expect that Learn-by-interact will serve as a foundation for agent data synthesis as LLMs are increasingly deployed at real-world environments.

상호작용을 통한 학습: 현실적 환경에서의 자기적응 에이전트를 위한 데이터 중심 프레임워크

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

초록

Support