自我挑战式语言模型代理

摘要

大型语言模型正迅速成为能够使用工具的智能代理的基础。然而，训练此类代理具有挑战性，因为它需要人工创建和标注多样化的任务集、工具集以及评估标准。本文提出了一种自我挑战框架，用于训练代理在自身生成的高质量任务上进行学习。代理首先扮演挑战者的角色，在与给定工具交互后生成任务。这些任务采用了一种新颖的通用问题类别，称为“代码即任务”，其由指令、验证函数以及作为测试的解决方案和失败案例定义，从而能够筛选出仅高质量的任务。随后，代理转为执行者角色，利用评估反馈作为奖励，通过强化学习在这些任务上进行训练。在现有的多轮工具使用代理基准测试M3ToolEval和TauBench上的评估表明，尽管仅使用自生成的训练数据，自我挑战框架在Llama-3.1-8B-Instruct模型上实现了超过两倍的性能提升。

English

Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for high-quality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct, despite using only self-generated training data.

自我挑战式语言模型代理

Self-Challenging Language Model Agents

摘要

Support