ChatPaper.aiChatPaper

自我挑战式语言模型代理

Self-Challenging Language Model Agents

June 2, 2025
作者: Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, Sainbayar Sukhbaatar
cs.AI

摘要

大型語言模型正迅速成為能夠使用工具之智能代理的基礎。然而,訓練此類代理具有挑戰性,因為它需要人類創建並註釋多樣化的任務集、工具及評估標準。本文中,我們提出了一種自我挑戰框架,用於訓練代理基於其自身生成的高質量任務。該代理首先扮演挑戰者的角色,在與給定工具互動後生成任務。這些任務以一種新穎的通用問題類別——代碼即任務(Code-as-Task)的形式呈現,由指令、驗證函數以及作為測試的解決方案和失敗案例定義,從而僅篩選出高質量任務。隨後,代理轉換為執行者角色,利用評估反饋作為獎勵,通過強化學習在這些任務上進行訓練。在兩個現有的多輪次工具使用代理基準測試——M3ToolEval和TauBench上的評估顯示,儘管僅使用自我生成的訓練數據,自我挑戰框架在Llama-3.1-8B-Instruct模型上實現了超過兩倍的性能提升。
English
Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for high-quality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct, despite using only self-generated training data.
PDF82June 4, 2025