Lumos:具有统一数据、模块化设计和开源LLM的学习代理
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
November 9, 2023
作者: Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin
cs.AI
摘要
我们介绍了 Lumos,这是一个用于训练语言代理的新型框架,采用统一的数据格式和基于开源大型语言模型(LLMs)的模块化架构。Lumos 包括三个不同的模块:规划、基础和执行。规划模块将任务分解为一系列高级、与工具无关的子目标,然后通过基础模块将其具体化为一组低级动作。这些动作随后由执行模块执行,利用各种现成的工具和API。为了有效训练这些模块,收集了高质量的子目标和动作注释,并可用于微调开源LLMs以用于各种任务,如复杂问题回答、网络任务和数学问题。利用这一统一数据和模块化设计,Lumos 不仅实现了与当前最先进代理相媲美或更优越的性能,而且展现了几个关键优势:(1)Lumos 在复杂问题回答和网络任务中超越了基于 GPT-4/3.5 的代理,同时在数学任务上与规模显著更大的LLM代理性能相当;(2)Lumos 胜过通过传统训练方法创建的开源代理和使用思维链训练的代理;以及(3)Lumos 能够有效地泛化到未见过的交互式任务,胜过更大规模的LLM代理,甚至超过专门代理的性能。
English
We introduce Lumos, a novel framework for training language agents that
employs a unified data format and a modular architecture based on open-source
large language models (LLMs). Lumos consists of three distinct modules:
planning, grounding, and execution. The planning module breaks down a task into
a series of high-level, tool-agnostic subgoals, which are then made specific by
the grounding module through a set of low-level actions. These actions are
subsequently executed by the execution module, utilizing a range of
off-the-shelf tools and APIs. In order to train these modules effectively,
high-quality annotations of subgoals and actions were collected and are made
available for fine-tuning open-source LLMs for various tasks such as complex
question answering, web tasks, and math problems. Leveraging this unified data
and modular design, Lumos not only achieves comparable or superior performance
to current, state-of-the-art agents, but also exhibits several key advantages:
(1) Lumos surpasses GPT-4/3.5-based agents in complex question answering and
web tasks, while equalling the performance of significantly larger LLM agents
on math tasks; (2) Lumos outperforms open-source agents created through
conventional training methods and those using chain-of-thoughts training; and
(3) Lumos is capable of effectively generalizing to unseen interactive tasks,
outperforming larger LLM-based agents and even exceeding performance of
specialized agents.