PIPPA:一个部分合成的对话数据集
PIPPA: A Partially Synthetic Conversational Dataset
August 11, 2023
作者: Tear Gosling, Alpin Dale, Yinhe Zheng
cs.AI
摘要
随着功能日益强大的大型语言模型的出现,人们对利用这些模型进行日常对话和角色扮演应用产生了浓厚兴趣。然而,现有的对话和角色扮演数据集通常无法捕捉到真实角色扮演参与者通常展示的多样化和微妙的互动。为了解决这一局限并为这个快速增长的领域做出贡献,我们介绍了一个部分合成的数据集,名为PIPPA(人与人工智能之间的个人互动对)。PIPPA是社区驱动的众包努力的结果,涉及一群角色扮演爱好者。该数据集包括分布在26,000个对话会话中的超过100万个话语,为研究人员和人工智能开发人员提供了丰富的资源,以探索和完善在角色扮演场景中的对话人工智能系统。
English
With the emergence of increasingly powerful large language models, there is a
burgeoning interest in leveraging these models for casual conversation and
role-play applications. However, existing conversational and role-playing
datasets often fail to capture the diverse and nuanced interactions typically
exhibited by real-world role-play participants. To address this limitation and
contribute to the rapidly growing field, we introduce a partially-synthetic
dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA
is a result of a community-driven crowdsourcing effort involving a group of
role-play enthusiasts. The dataset comprises over 1 million utterances that are
distributed across 26,000 conversation sessions and provides a rich resource
for researchers and AI developers to explore and refine conversational AI
systems in the context of role-play scenarios.