ChatPaper.aiChatPaper

在想像空間中的LLMs:通過模擬試錯學習工具

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

March 7, 2024
作者: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su
cs.AI

摘要

工具對於大型語言模型(LLMs)在外部環境中獲取最新資訊並採取重要行動至關重要。現有關於工具增強的LLMs的研究主要集中在工具的廣泛覆蓋和添加新工具的靈活性。然而,一個令人驚訝地鮮少被研究的關鍵方面是LLM如何準確地使用其接受訓練的工具。我們發現,包括GPT-4和專門為工具使用進行微調的開源LLMs在實際使用中僅達到30%至60%的正確率,遠遠不足以可靠地應用。我們提出了一種受生物啟發的方法,用於工具增強的LLMs,即模擬試錯(STE),該方法組織了三個成功使用工具行為的關鍵機制:試錯、想像和記憶。具體而言,STE利用LLM的“想像力”來模擬使用工具的可能情景,之後LLM與工具互動以從執行反饋中學習。短期和長期記憶均被用於分別改善探索的深度和廣度。在ToolBench上的全面實驗表明,STE顯著提高了LLMs對工具的學習效果,無論是在上下文學習還是微調設置下,為Mistral-Instruct-7B帶來了46.7%的提升,使其能夠勝過GPT-4。我們還展示了通過簡單的經驗重播策略有效地持續學習工具。
English
Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's 'imagination' to simulate plausible scenarios for using a tool, after which the LLM interacts with the tool to learn from its execution feedback. Both short-term and long-term memory are employed to improve the depth and breadth of the exploration, respectively. Comprehensive experiments on ToolBench show that STE substantially improves tool learning for LLMs under both in-context learning and fine-tuning settings, bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform GPT-4. We also show effective continual learning of tools via a simple experience replay strategy.
PDF261December 15, 2024