AgentTuning：为LLMs实现通用智能能力

AgentTuning: Enabling Generalized Agent Abilities for LLMs

October 19, 2023

作者: Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang

cs.AI

摘要

开放式大型语言模型（LLMs）在各种任务中表现出色，显著推动了LLMs的发展。然而，当作为处理现实世界复杂任务的代理时，它们远不及商业模型如ChatGPT和GPT-4。这些代理任务将LLMs作为中央控制器，负责规划、记忆和工具利用，需要精细提示方法和强大的LLMs以实现令人满意的性能。尽管已提出许多提示方法来完成特定代理任务，但缺乏专注于提升LLMs代理能力而不损害其通用能力的研究。在这项工作中，我们提出了AgentTuning，这是一种简单通用的方法，可增强LLMs的代理能力，同时保持其通用LLMs能力。我们构建了AgentInstruct，一个轻量级的指令调整数据集，包含高质量的交互轨迹。我们采用混合指令调整策略，将AgentInstruct与通用领域的开源指令相结合。AgentTuning用于指令调整Llama 2系列，生成AgentLM。我们的评估表明，AgentTuning使LLMs的代理能力得到增强，而不损害其通用能力。AgentLM-70B在未知代理任务上可与GPT-3.5-turbo相媲美，展示了广义代理能力。我们在https://github.com/THUDM/AgentTuning 开源了AgentInstruct和AgentLM-7B、13B和70B模型，为代理任务提供了开放且强大的替代方案。

English

Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at https://github.com/THUDM/AgentTuning , serving open and powerful alternatives to commercial LLMs for agent tasks.