DialogStudio:面向对话人工智能的最丰富和最多样化统一数据集收集
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
July 19, 2023
作者: Jianguo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong
cs.AI
摘要
尽管会话人工智能取得了进展,语言模型在处理多样化对话任务时仍面临挑战,现有的对话数据集往往缺乏多样性和全面性。为了解决这些问题,我们推出了DialogStudio:这是最大、最多样化的对话数据集合,统一采用一致的格式,同时保留其原始信息。我们的收藏涵盖了开放领域对话、面向任务的对话、自然语言理解、会话推荐、对话摘要以及知识驱动对话的数据,使其成为对话研究和模型训练的极其丰富和多样化资源。为了进一步提高DialogStudio的效用,我们确定了每个数据集的许可证,并为选定的对话设计了领域感知提示,以促进面向指令的微调。此外,我们利用数据集合开发了会话人工智能模型,我们在零-shot和少-shot学习场景中的实验表明了DialogStudio的优越性。为了提高透明度并支持数据集和基于任务的研究,以及语言模型的预训练,与DialogStudio相关的所有数据集、许可证、代码和模型都可以在https://github.com/salesforce/DialogStudio 上公开访问。
English
Despite advancements in conversational AI, language models encounter
challenges to handle diverse conversational tasks, and existing dialogue
dataset collections often lack diversity and comprehensiveness. To tackle these
issues, we introduce DialogStudio: the largest and most diverse collection of
dialogue datasets, unified under a consistent format while preserving their
original information. Our collection encompasses data from open-domain
dialogues, task-oriented dialogues, natural language understanding,
conversational recommendation, dialogue summarization, and knowledge-grounded
dialogues, making it an incredibly rich and diverse resource for dialogue
research and model training. To further enhance the utility of DialogStudio, we
identify the licenses for each dataset and design domain-aware prompts for
selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we
develop conversational AI models using the dataset collection, and our
experiments in both zero-shot and few-shot learning scenarios demonstrate the
superiority of DialogStudio. To improve transparency and support dataset and
task-based research, as well as language model pre-training, all datasets,
licenses, codes, and models associated with DialogStudio are made publicly
accessible at https://github.com/salesforce/DialogStudio