Agentes de Modelos de Lenguaje con Autodesafío

Resumen

Los modelos de lenguaje de gran escala se están convirtiendo rápidamente en la base para agentes inteligentes capaces de utilizar herramientas. Sin embargo, entrenar a estos agentes es un desafío, ya que requiere la creación y anotación humana de un conjunto diverso de tareas, herramientas y criterios de evaluación. En este artículo, proponemos el marco de Auto-Desafío para entrenar a un agente en tareas de alta calidad generadas por él mismo. El agente primero asume el rol de desafiante y genera una tarea después de interactuar con las herramientas proporcionadas. Las tareas adoptan la forma de una nueva clase general de problemas denominada Código-como-Tarea, que se define mediante una instrucción, una función de verificación y casos de solución y fallo que sirven como pruebas, permitiendo filtrar solo las tareas de alta calidad. Luego, el agente toma el rol de ejecutor y se entrena en esas tareas con aprendizaje por refuerzo, utilizando la retroalimentación de evaluación como recompensa. La evaluación en dos puntos de referencia existentes para agentes que utilizan herramientas en múltiples turnos, M3ToolEval y TauBench, muestra que el marco de Auto-Desafío logra una mejora de más del doble en Llama-3.1-8B-Instruct, a pesar de utilizar únicamente datos de entrenamiento autogenerados.

English

Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for high-quality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct, despite using only self-generated training data.

Agentes de Modelos de Lenguaje con Autodesafío

Self-Challenging Language Model Agents

Resumen

Support