PIPPA:一個部分合成對話資料集
PIPPA: A Partially Synthetic Conversational Dataset
August 11, 2023
作者: Tear Gosling, Alpin Dale, Yinhe Zheng
cs.AI
摘要
隨著越來越強大的大型語言模型的出現,人們對利用這些模型進行非正式對話和角色扮演應用表現出興趣。然而,現有的對話和角色扮演數據集通常無法捕捉真實世界角色扮演參與者通常展示的多樣和微妙的互動。為了應對這一限制並為這個快速增長的領域做出貢獻,我們介紹了一個部分合成的數據集,名為PIPPA(人與人工智能之間的個人互動對)。PIPPA是社區驅動的眾包努力的結果,涉及一群角色扮演愛好者。該數據集包含超過100萬個發話,分佈在26,000個對話會話中,為研究人員和人工智能開發人員提供了豐富的資源,以探索和完善在角色扮演情境下的對話人工智能系統。
English
With the emergence of increasingly powerful large language models, there is a
burgeoning interest in leveraging these models for casual conversation and
role-play applications. However, existing conversational and role-playing
datasets often fail to capture the diverse and nuanced interactions typically
exhibited by real-world role-play participants. To address this limitation and
contribute to the rapidly growing field, we introduce a partially-synthetic
dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA
is a result of a community-driven crowdsourcing effort involving a group of
role-play enthusiasts. The dataset comprises over 1 million utterances that are
distributed across 26,000 conversation sessions and provides a rich resource
for researchers and AI developers to explore and refine conversational AI
systems in the context of role-play scenarios.