ChatPaper.aiChatPaper

Lumos:具有統一數據、模塊化設計和開源LLM的學習智能體

Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs

November 9, 2023
作者: Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin
cs.AI

摘要

我們介紹了 Lumos,一個用於訓練語言代理的新型框架,採用統一的數據格式和基於開源大型語言模型(LLMs)的模塊化架構。Lumos 包括三個獨立的模塊:規劃、接地和執行。規劃模塊將任務分解為一系列高級、與工具無關的子目標,然後通過接地模塊通過一組低級動作使其具體化。這些動作隨後由執行模塊執行,利用各種現成的工具和API。為了有效訓練這些模塊,收集了高質量的子目標和動作標註,並可用於微調開源LLMs以應對諸如複雜問答、網絡任務和數學問題等各種任務。通過利用這種統一的數據和模塊化設計,Lumos 不僅實現了與當前最先進代理相當或更優越的性能,還表現出幾個關鍵優勢:(1)Lumos 在複雜問答和網絡任務方面超越了基於 GPT-4/3.5 的代理,同時在數學任務上與明顯更大的LLM代理性能相當;(2)Lumos 優於通過傳統訓練方法創建的開源代理和使用思維鏈訓練的代理;以及(3)Lumos 能夠有效地推廣到未見過的互動任務,優於更大的LLM代理,甚至超越專門代理的性能。
English
We introduce Lumos, a novel framework for training language agents that employs a unified data format and a modular architecture based on open-source large language models (LLMs). Lumos consists of three distinct modules: planning, grounding, and execution. The planning module breaks down a task into a series of high-level, tool-agnostic subgoals, which are then made specific by the grounding module through a set of low-level actions. These actions are subsequently executed by the execution module, utilizing a range of off-the-shelf tools and APIs. In order to train these modules effectively, high-quality annotations of subgoals and actions were collected and are made available for fine-tuning open-source LLMs for various tasks such as complex question answering, web tasks, and math problems. Leveraging this unified data and modular design, Lumos not only achieves comparable or superior performance to current, state-of-the-art agents, but also exhibits several key advantages: (1) Lumos surpasses GPT-4/3.5-based agents in complex question answering and web tasks, while equalling the performance of significantly larger LLM agents on math tasks; (2) Lumos outperforms open-source agents created through conventional training methods and those using chain-of-thoughts training; and (3) Lumos is capable of effectively generalizing to unseen interactive tasks, outperforming larger LLM-based agents and even exceeding performance of specialized agents.
PDF322December 15, 2024