在“想象空间”中的LLMs:通过模拟试错学习工具
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
March 7, 2024
作者: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su
cs.AI
摘要
工具对于大型语言模型(LLMs)在获取最新信息并在外部环境中采取重要行动方面至关重要。现有关于工具增强型LLMs的研究主要集中在工具的广泛覆盖范围和添加新工具的灵活性上。然而,一个被人们惊讶地忽视的关键方面是LLM准确地使用其训练过的工具。我们发现,包括GPT-4和专门针对工具使用进行微调的开源LLMs在正确率方面仅达到30%至60%的范围,远未达到实际可靠使用的水平。我们提出了一种受生物启发的工具增强型LLMs方法,即模拟试错(STE),它协调了生物系统中成功使用工具行为的三个关键机制:试错、想象力和记忆。具体而言,STE利用LLM的“想象力”模拟使用工具的可能场景,之后LLM与工具互动以从执行反馈中学习。短期和长期记忆被用来分别提高探索的深度和广度。在ToolBench上进行的全面实验表明,STE显著改善了LLMs的工具学习,在上下文学习和微调设置下,为Mistral-Instruct-7B带来了46.7%的提升,并使其胜过了GPT-4。我们还展示了通过简单的经验重放策略有效地持续学习工具。
English
Tools are essential for large language models (LLMs) to acquire up-to-date
information and take consequential actions in external environments. Existing
work on tool-augmented LLMs primarily focuses on the broad coverage of tools
and the flexibility of adding new tools. However, a critical aspect that has
surprisingly been understudied is simply how accurately an LLM uses tools for
which it has been trained. We find that existing LLMs, including GPT-4 and
open-source LLMs specifically fine-tuned for tool use, only reach a correctness
rate in the range of 30% to 60%, far from reliable use in practice. We propose
a biologically inspired method for tool-augmented LLMs, simulated trial and
error (STE), that orchestrates three key mechanisms for successful tool use
behaviors in the biological system: trial and error, imagination, and memory.
Specifically, STE leverages an LLM's 'imagination' to simulate plausible
scenarios for using a tool, after which the LLM interacts with the tool to
learn from its execution feedback. Both short-term and long-term memory are
employed to improve the depth and breadth of the exploration, respectively.
Comprehensive experiments on ToolBench show that STE substantially improves
tool learning for LLMs under both in-context learning and fine-tuning settings,
bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform
GPT-4. We also show effective continual learning of tools via a simple
experience replay strategy.