AgentTuning:為LLM實現通用代理能力
AgentTuning: Enabling Generalized Agent Abilities for LLMs
October 19, 2023
作者: Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang
cs.AI
摘要
開放式大型語言模型(LLMs)在各種任務中表現出色,顯著推動了LLMs的發展。然而,當作為應對現實世界複雜任務的代理時,它們遠遠不及商業模型如ChatGPT和GPT-4。這些代理任務將LLMs作為負責規劃、記憶和工具利用的中央控制器,需要精細的提示方法和強大的LLMs以達到滿意的表現。儘管已提出許多提示方法來完成特定的代理任務,但缺乏專注於提升LLMs自身代理能力而不損害其一般能力的研究。在這項工作中,我們提出AgentTuning,這是一種簡單通用的方法,可增強LLMs的代理能力,同時保持其一般LLM能力。我們構建了AgentInstruct,一個包含高質量互動軌跡的輕量級指令調整數據集。我們採用混合指令調整策略,將AgentInstruct與來自一般領域的開源指令相結合。AgentTuning用於指令調整Llama 2系列,產生AgentLM。我們的評估顯示,AgentTuning使LLMs的代理能力得到提升,而不損害其一般能力。AgentLM-70B在未知的代理任務上與GPT-3.5-turbo相當,展示了通用的代理能力。我們在https://github.com/THUDM/AgentTuning 開源了AgentInstruct和AgentLM-7B、13B和70B模型,為代理任務提供了開放且強大的替代方案。
English
Open large language models (LLMs) with great performance in various tasks
have significantly advanced the development of LLMs. However, they are far
inferior to commercial models such as ChatGPT and GPT-4 when acting as agents
to tackle complex tasks in the real world. These agent tasks employ LLMs as the
central controller responsible for planning, memorization, and tool
utilization, necessitating both fine-grained prompting methods and robust LLMs
to achieve satisfactory performance. Though many prompting methods have been
proposed to complete particular agent tasks, there is lack of research focusing
on improving the agent capabilities of LLMs themselves without compromising
their general abilities. In this work, we present AgentTuning, a simple and
general method to enhance the agent abilities of LLMs while maintaining their
general LLM capabilities. We construct AgentInstruct, a lightweight
instruction-tuning dataset containing high-quality interaction trajectories. We
employ a hybrid instruction-tuning strategy by combining AgentInstruct with
open-source instructions from general domains. AgentTuning is used to
instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show
that AgentTuning enables LLMs' agent capabilities without compromising general
abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent
tasks, demonstrating generalized agent capabilities. We open source the
AgentInstruct and AgentLM-7B, 13B, and 70B models at
https://github.com/THUDM/AgentTuning , serving open and powerful alternatives
to commercial LLMs for agent tasks.