自己挑戦型言語モデルエージェント

要旨

大規模言語モデルは、ツールを使用可能な知的エージェントの基盤として急速に進化しています。しかし、そのようなエージェントを訓練することは困難です。なぜなら、多様なタスク、ツール、評価基準を人間が作成し、注釈を付ける必要があるからです。本論文では、エージェント自身が生成した高品質なタスクを用いて訓練を行う「Self-Challenging」フレームワークを提案します。このフレームワークでは、エージェントはまず挑戦者としての役割を果たし、与えられたツールと対話した後にタスクを生成します。これらのタスクは「Code-as-Task」と呼ばれる新しい一般的な問題クラスとして定義され、指示文、検証関数、およびテストとして機能する解決例と失敗例を含みます。これにより、高品質なタスクのみを選別することが可能です。その後、エージェントは実行者としての役割を担い、評価フィードバックを報酬として強化学習を用いてこれらのタスクで訓練を行います。既存のマルチターンツール使用エージェントベンチマークであるM3ToolEvalとTauBenchでの評価では、Self-ChallengingフレームワークがLlama-3.1-8B-Instructにおいて2倍以上の改善を達成し、自己生成した訓練データのみを使用しているにもかかわらず優れた結果を示しました。

English

Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for high-quality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct, despite using only self-generated training data.

自己挑戦型言語モデルエージェント

Self-Challenging Language Model Agents

要旨

Support