利用大型语言模型生成忠实于人物角色的对话数据集
Faithful Persona-based Conversational Dataset Generation with Large Language Models
December 15, 2023
作者: Pegah Jandaghi, XiangHai Sheng, Xinyi Bai, Jay Pujara, Hakim Sidahmed
cs.AI
摘要
为开发能够与用户交流的人工智能模型,高质量的对话数据集至关重要。促进聊天机器人与用户之间更深入的互动的一种方法是通过人物角色,即用户性格、动机和行为的洞察。在多样化和全面的基于人物角色的数据集上训练自然语言处理(NLP)模型可以导致创造与用户建立更深连接并保持其参与度的对话模型。在本文中,我们利用大型语言模型(LLMs)的力量,从种子数据集创建一个大规模、高质量的对话数据集。我们提出了一个生成-评论者架构框架,以扩展初始数据集的同时提高对话质量。生成器是一个被提示输出对话的LLM。评论者由一组专家LLMs组成,控制生成对话的质量。这些专家选择最佳的生成对话,然后我们用它们来改进生成器。我们发布了合成人物对话数据集,包括从人物对话中种子生成的20k个对话。我们通过大量实验评估了合成人物对话数据集的质量以及我们的生成框架在不同维度上的表现,并观察到在图灵测试中,合成人物对话数据集与人物对话之间的失败率在三次迭代中从17.2%下降到8.8%。
English
High-quality conversational datasets are essential for developing AI models
that can communicate with users. One way to foster deeper interactions between
a chatbot and its user is through personas, aspects of the user's character
that provide insights into their personality, motivations, and behaviors.
Training Natural Language Processing (NLP) models on a diverse and
comprehensive persona-based dataset can lead to conversational models that
create a deeper connection with the user, and maintain their engagement. In
this paper, we leverage the power of Large Language Models (LLMs) to create a
large, high-quality conversational dataset from a seed dataset. We propose a
Generator-Critic architecture framework to expand the initial dataset, while
improving the quality of its conversations. The Generator is an LLM prompted to
output conversations. The Critic consists of a mixture of expert LLMs that
control the quality of the generated conversations. These experts select the
best generated conversations, which we then use to improve the Generator. We
release Synthetic-Persona-Chat, consisting of 20k conversations seeded from
Persona-Chat. We evaluate the quality of Synthetic-Persona-Chat and our
generation framework on different dimensions through extensive experiments, and
observe that the losing rate of Synthetic-Persona-Chat against Persona-Chat
during Turing test decreases from 17.2% to 8.8% over three iterations.