대규모 언어 모델을 활용한 신뢰할 수 있는 페르소나 기반 대화 데이터셋 생성

초록

고품질 대화 데이터셋은 사용자와 소통할 수 있는 AI 모델을 개발하는 데 필수적입니다. 챗봇과 사용자 간의 깊은 상호작용을 촉진하는 한 가지 방법은 사용자의 성격, 동기, 행동에 대한 통찰을 제공하는 페르소나를 활용하는 것입니다. 자연어 처리(NLP) 모델을 다양하고 포괄적인 페르소나 기반 데이터셋으로 학습시키면, 사용자와 더 깊은 연결을 형성하고 참여를 유지하는 대화 모델을 개발할 수 있습니다. 본 논문에서는 대형 언어 모델(LLM)의 힘을 활용하여 시드 데이터셋으로부터 대규모 고품질 대화 데이터셋을 생성합니다. 초기 데이터셋을 확장하면서 대화 품질을 개선하기 위해 생성자-비평가 아키텍처 프레임워크를 제안합니다. 생성자는 대화를 출력하도록 프롬프트된 LLM입니다. 비평가는 생성된 대화의 품질을 통제하는 전문가 LLM들의 혼합으로 구성됩니다. 이 전문가들은 생성된 대화 중 최고의 대화를 선별하며, 이를 통해 생성자를 개선합니다. 우리는 Persona-Chat에서 시드된 20,000개의 대화로 구성된 Synthetic-Persona-Chat을 공개합니다. Synthetic-Persona-Chat과 생성 프레임워크의 품질을 다양한 차원에서 광범위한 실험을 통해 평가하였으며, 튜링 테스트에서 Synthetic-Persona-Chat이 Persona-Chat에 대해 패배하는 비율이 세 번의 반복을 통해 17.2%에서 8.8%로 감소함을 관찰했습니다.

English

High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and comprehensive persona-based dataset can lead to conversational models that create a deeper connection with the user, and maintain their engagement. In this paper, we leverage the power of Large Language Models (LLMs) to create a large, high-quality conversational dataset from a seed dataset. We propose a Generator-Critic architecture framework to expand the initial dataset, while improving the quality of its conversations. The Generator is an LLM prompted to output conversations. The Critic consists of a mixture of expert LLMs that control the quality of the generated conversations. These experts select the best generated conversations, which we then use to improve the Generator. We release Synthetic-Persona-Chat, consisting of 20k conversations seeded from Persona-Chat. We evaluate the quality of Synthetic-Persona-Chat and our generation framework on different dimensions through extensive experiments, and observe that the losing rate of Synthetic-Persona-Chat against Persona-Chat during Turing test decreases from 17.2% to 8.8% over three iterations.

대규모 언어 모델을 활용한 신뢰할 수 있는 페르소나 기반 대화 데이터셋 생성

Faithful Persona-based Conversational Dataset Generation with Large Language Models

초록

Support