ReST meets ReAct:用于多步推理LLM代理的自我改进
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
December 15, 2023
作者: Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar
cs.AI
摘要
回答复杂的自然语言问题通常需要多步推理和整合外部信息。一些系统已经将知识检索与大型语言模型(LLM)相结合,以回答此类问题。然而,这些系统存在各种失败情况,我们无法直接端对端地训练它们来修复这些失败,因为与外部知识的交互是不可微分的。为了解决这些不足,我们定义了一种具有推理和对外部知识采取行动能力的ReAct风格LLM代理。我们通过一种类似ReST的方法进一步完善代理,该方法通过在先前轨迹上进行迭代训练,采用增量批强化学习与AI反馈进行持续自我改进和自我蒸馏。从一个提示的大型模型开始,在算法仅两次迭代之后,我们就能产生一个经过精细调整的小型模型,该模型在具有两个数量级更少参数的具有挑战性的组合式问答基准测试上实现了可比较的性能。
English
Answering complex natural language questions often necessitates multi-step
reasoning and integrating external information. Several systems have combined
knowledge retrieval with a large language model (LLM) to answer such questions.
These systems, however, suffer from various failure cases, and we cannot
directly train them end-to-end to fix such failures, as interaction with
external knowledge is non-differentiable. To address these deficiencies, we
define a ReAct-style LLM agent with the ability to reason and act upon external
knowledge. We further refine the agent through a ReST-like method that
iteratively trains on previous trajectories, employing growing-batch
reinforcement learning with AI feedback for continuous self-improvement and
self-distillation. Starting from a prompted large model and after just two
iterations of the algorithm, we can produce a fine-tuned small model that
achieves comparable performance on challenging compositional question-answering
benchmarks with two orders of magnitude fewer parameters.