PIPPA: 부분적으로 합성된 대화 데이터셋

초록

점점 더 강력해지는 대규모 언어 모델의 등장과 함께, 이러한 모델을 캐주얼 대화 및 롤플레이 애플리케이션에 활용하려는 관심이 급증하고 있습니다. 그러나 기존의 대화 및 롤플레이 데이터셋은 실제 롤플레이 참가자들이 보여주는 다양하고 미묘한 상호작용을 종종 제대로 포착하지 못합니다. 이러한 한계를 해결하고 빠르게 성장하는 이 분야에 기여하기 위해, 우리는 부분적으로 합성된 데이터셋인 PIPPA(Personal Interaction Pairs between People and AI)를 소개합니다. PIPPA는 롤플레이 애호가 그룹이 참여한 커뮤니티 주도의 크라우드소싱 노력의 결과물입니다. 이 데이터셋은 26,000개의 대화 세션에 걸쳐 분포된 100만 개 이상의 발화를 포함하며, 롤플레이 시나리오에서 대화형 AI 시스템을 탐구하고 개선하려는 연구자 및 AI 개발자들에게 풍부한 자원을 제공합니다.

English

With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.

PIPPA: 부분적으로 합성된 대화 데이터셋

PIPPA: A Partially Synthetic Conversational Dataset

초록

Support