ChatPaper.aiChatPaper

音訊對話:音訊與音樂理解的對話資料集

Audio Dialogues: Dialogues dataset for audio and music understanding

April 11, 2024
作者: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro
cs.AI

摘要

現有的音訊理解資料集主要專注於單輪互動(例如音訊字幕、音訊問答),用自然語言描述音訊,從而限制了透過互動對話理解音訊。為彌補這一不足,我們引入了「音訊對話」:一個包含163.8k樣本的多輪對話資料集,涵蓋一般音效和音樂。除了對話外,音訊對話還包含問答配對,以理解和比較多個輸入音訊。音訊對話採用基於提示的方法,利用現有資料集的字幕標註,使用大型語言模型(LLM)生成多輪對話。我們在我們提出的資料集上評估現有的音訊增強型大型語言模型,以展示音訊對話的複雜性和應用性。我們的生成資料集的程式碼將公開提供。詳細的提示和生成的對話可在示範網站https://audiodialogues.github.io/上找到。
English
Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/.

Summary

AI-Generated Summary

PDF161December 15, 2024