PIPPA: 部分的に合成された対話データセット

要旨

強力な大規模言語モデルの登場に伴い、これらのモデルをカジュアルな会話やロールプレイアプリケーションに活用することへの関心が高まっています。しかし、既存の会話およびロールプレイ用データセットは、現実世界のロールプレイ参加者が示す多様で微妙な相互作用を十分に捉えられていないことが多いです。この制約を解決し、急速に成長しているこの分野に貢献するため、我々はPIPPA（Personal Interaction Pairs between People and AI）という部分的に合成されたデータセットを紹介します。PIPPAは、ロールプレイ愛好家のグループによるコミュニティ主導のクラウドソーシング活動の結果として作成されました。このデータセットは、26,000の会話セッションにわたって分布する100万以上の発話を含み、ロールプレイシナリオにおける会話型AIシステムの探索と改良のための豊富なリソースを研究者やAI開発者に提供します。

English

With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.

PIPPA: 部分的に合成された対話データセット

PIPPA: A Partially Synthetic Conversational Dataset

要旨

Support