ChatPaper.aiChatPaper

互動式學習:一個針對現實環境中自適應智能體的以數據為中心的框架

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

January 18, 2025
作者: Hongjin Su, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan Ö. Arık
cs.AI

摘要

由大型語言模型(LLMs)驅動的自主代理具有增強人類能力的潛力,可協助從發送電子郵件到執行數據分析等數字任務。現有LLMs在這些任務上的能力常常受制於與其互動的環境缺乏高質量的代理數據。我們提出了Learn-by-interact,這是一個以數據為中心的框架,可使LLM代理適應任何給定環境而無需人類標註。Learn-by-interact根據文檔合成代理-環境互動的軌跡,並通過總結或提取互動歷史構建指令,這一過程稱為反向構建。我們通過在基於訓練的情景和基於訓練的無上下文學習(ICL)中使用合成數據來評估其質量,在其中我們為代理設計了針對性的創新檢索方法。跨越現實編碼、Web和桌面環境的SWE-bench、WebArena、OSWorld和Spider2-V的大量實驗顯示了Learn-by-interact在各種下游代理任務中的有效性--對於ICL,Claude-3.5的基線結果提高了最高達12.2%,而對於Codestral-22B的訓練提高了19.5%。我們進一步展示了反向構建的關鍵作用,為訓練提供了高達14.0%的改善。我們的消融研究展示了我們在ICL中提供的合成數據的效率,以及我們的檢索管道相對於傳統的檢索增強生成(RAG)等替代方法的優越性。我們預期Learn-by-interact將作為代理數據合成的基礎,隨著LLMs在真實世界環境中的部署越來越廣泛。
English
Autonomous agents powered by large language models (LLMs) have the potential to enhance human capabilities, assisting with digital tasks from sending emails to performing data analysis. The abilities of existing LLMs at such tasks are often hindered by the lack of high-quality agent data from the corresponding environments they interact with. We propose Learn-by-interact, a data-centric framework to adapt LLM agents to any given environments without human annotations. Learn-by-interact synthesizes trajectories of agent-environment interactions based on documentations, and constructs instructions by summarizing or abstracting the interaction histories, a process called backward construction. We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL), where we craft innovative retrieval approaches optimized for agents. Extensive experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact in various downstream agentic tasks -- baseline results are improved by up to 12.2\% for ICL with Claude-3.5 and 19.5\% for training with Codestral-22B. We further demonstrate the critical role of backward construction, which provides up to 14.0\% improvement for training. Our ablation studies demonstrate the efficiency provided by our synthesized data in ICL and the superiority of our retrieval pipeline over alternative approaches like conventional retrieval-augmented generation (RAG). We expect that Learn-by-interact will serve as a foundation for agent data synthesis as LLMs are increasingly deployed at real-world environments.

Summary

AI-Generated Summary

PDF262January 22, 2025