ChatPaper.aiChatPaper

Stark:带有人格常识知识的社交长期多模态对话

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

July 4, 2024
作者: Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Byungsoo Ko, Jonghwan Hyeon, Ho-Jin Choi
cs.AI

摘要

人类通过即时通讯工具在对话中分享与个人经历相关的各种图像。然而,现有研究侧重于(1)单个会话中的图像分享行为,导致长期社交互动受限,以及(2)缺乏个性化的图像分享行为。在本研究中,我们介绍了Stark,一个涵盖多种社交人设、多模态格式、时间间隔和图像的大规模长期多模态对话数据集。为了自动构建Stark,我们提出了一种新颖的多模态情境化框架Mcu,它从ChatGPT和我们提出的计划与执行图像对齐器中生成长期多模态对话。利用我们的Stark,我们训练了一个多模态对话模型Ultron 7B,展示了出色的视觉想象能力。此外,我们展示了我们数据集在人类评估中的有效性。我们已公开提供我们的源代码和数据集。
English
Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce Stark, a large-scale long-term multi-modal conversation dataset that covers a wide range of social personas in a multi-modality format, time intervals, and images. To construct Stark automatically, we propose a novel multi-modal contextualization framework, Mcu, that generates long-term multi-modal dialogue distilled from ChatGPT and our proposed Plan-and-Execute image aligner. Using our Stark, we train a multi-modal conversation model, Ultron 7B, which demonstrates impressive visual imagination ability. Furthermore, we demonstrate the effectiveness of our dataset in human evaluation. We make our source code and dataset publicly available.

Summary

AI-Generated Summary

PDF221November 28, 2024