通过自我对话引导基于LLM的面向任务的对话代理的引导
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
January 10, 2024
作者: Dennis Ulmer, Elman Mansimov, Kaixiang Lin, Justin Sun, Xibin Gao, Yi Zhang
cs.AI
摘要
大型语言模型(LLMs)是强大的对话代理,但将它们专门用于实现特定功能可能具有挑战性。指导微调,即在指令和人类生成的样本响应上微调模型(Ouyang等,2022),已被证明是一种有效的方法,但需要大量数据样本,这些数据可能a)不可用或b)生成成本高昂。此外,当目标是使LLM遵循对话中的特定工作流程而不仅仅是单个指令时,这种成本会增加。受强化学习中的自我对弈技术和LLMs模拟人类代理的启发,我们提出了一种通过LLMs在不同角色中进行对话来进行数据收集的更有效方法。这种方法通过LLMs的“自言自语”生成训练数据,可以进行精炼并用于监督微调。我们介绍了一种自动化的方法来衡量对话的(部分)成功。这个度量标准用于过滤生成的对话数据,然后馈送回LLM进行训练。基于我们对对话质量的自动化和人工评估,我们证明了这种自言自语数据改善了结果。此外,我们研究了展示生成对话质量的各种特征以及它们如何与作为训练数据的潜在效用相连接。
English
Large language models (LLMs) are powerful dialogue agents, but specializing
them towards fulfilling a specific function can be challenging. Instructing
tuning, i.e. tuning models on instruction and sample responses generated by
humans (Ouyang et al., 2022), has proven as an effective method to do so, yet
requires a number of data samples that a) might not be available or b) costly
to generate. Furthermore, this cost increases when the goal is to make the LLM
follow a specific workflow within a dialogue instead of single instructions.
Inspired by the self-play technique in reinforcement learning and the use of
LLMs to simulate human agents, we propose a more effective method for data
collection through LLMs engaging in a conversation in various roles. This
approach generates a training data via "self-talk" of LLMs that can be refined
and utilized for supervised fine-tuning. We introduce an automated way to
measure the (partial) success of a dialogue. This metric is used to filter the
generated conversational data that is fed back in LLM for training. Based on
our automated and human evaluations of conversation quality, we demonstrate
that such self-talk data improves results. In addition, we examine the various
characteristics that showcase the quality of generated dialogues and how they
can be connected to their potential utility as training data.