利用大型語言模型生成忠實的基於人物角色的對話數據集
Faithful Persona-based Conversational Dataset Generation with Large Language Models
December 15, 2023
作者: Pegah Jandaghi, XiangHai Sheng, Xinyi Bai, Jay Pujara, Hakim Sidahmed
cs.AI
摘要
發展能夠與使用者溝通的人工智慧模型所需的高質量對話數據集至關重要。促進聊天機器人與使用者之間更深入互動的一種方式是通過人物角色,即使用者性格、動機和行為的方面,提供洞察。在多樣化和全面的基於人物角色的數據集上訓練自然語言處理(NLP)模型可以導致創建與使用者建立更深層聯繫並保持其參與度的對話模型。在本文中,我們利用大型語言模型(LLMs)的能力從種子數據集創建一個大型高質量的對話數據集。我們提出了一個生成器-評論家架構框架,用於擴展初始數據集,同時提高其對話的質量。生成器是一個被提示輸出對話的LLM。評論家由一組專家LLMs組成,控制生成對話的質量。這些專家選擇最佳的生成對話,然後我們用來改進生成器。我們發布了一個名為合成人物角色對話(Synthetic-Persona-Chat)的數據集,其中包含從人物角色對話(Persona-Chat)中種子生成的20k對話。我們通過廣泛實驗評估了合成人物角色對話和我們的生成框架在不同維度上的質量,觀察到在圖靈測試中,合成人物角色對話與人物角色對話的失敗率在三次迭代中從17.2%降至8.8%。
English
High-quality conversational datasets are essential for developing AI models
that can communicate with users. One way to foster deeper interactions between
a chatbot and its user is through personas, aspects of the user's character
that provide insights into their personality, motivations, and behaviors.
Training Natural Language Processing (NLP) models on a diverse and
comprehensive persona-based dataset can lead to conversational models that
create a deeper connection with the user, and maintain their engagement. In
this paper, we leverage the power of Large Language Models (LLMs) to create a
large, high-quality conversational dataset from a seed dataset. We propose a
Generator-Critic architecture framework to expand the initial dataset, while
improving the quality of its conversations. The Generator is an LLM prompted to
output conversations. The Critic consists of a mixture of expert LLMs that
control the quality of the generated conversations. These experts select the
best generated conversations, which we then use to improve the Generator. We
release Synthetic-Persona-Chat, consisting of 20k conversations seeded from
Persona-Chat. We evaluate the quality of Synthetic-Persona-Chat and our
generation framework on different dimensions through extensive experiments, and
observe that the losing rate of Synthetic-Persona-Chat against Persona-Chat
during Turing test decreases from 17.2% to 8.8% over three iterations.