ChatPaper.aiChatPaper

ReCLAP:通过描述声音来改善零样本音频分类

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

September 13, 2024
作者: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
cs.AI

摘要

开放词汇的音频语言模型,如CLAP,通过使用自然语言提示指定的任意类别集合,为零样本音频分类(ZSAC)提供了一种有前途的方法。在本文中,我们提出了一种简单而有效的方法来改进使用CLAP的ZSAC。具体而言,我们从使用带有抽象类别标签的提示的传统方法(例如,风琴的声音)转变为使用描述声音的内在描述性特征在多样化环境中的提示(例如,风琴深沉而共鸣的音调充满了大教堂)。为了实现这一点,我们首先提出了ReCLAP,这是一个使用重写的音频字幕训练的CLAP模型,以改进对野外声音的理解。这些重写的字幕描述了原始字幕中的每个声音事件,使用它们独特的区分特征。ReCLAP在多模态音频文本检索和ZSAC上表现优于所有基线。接下来,为了改进使用ReCLAP的零样本音频分类,我们提出了提示增强。与传统的使用手写模板提示的方法相反,我们为数据集中的每个唯一标签生成自定义提示。这些自定义提示首先描述标签中的声音事件,然后在不同场景中使用它们。我们提出的方法将ZSAC上ReCLAP的性能提高了1%-18%,并且在所有基线上的表现提高了1%-55%。
English
Open-vocabulary audio-language models, like CLAP, offer a promising approach for zero-shot audio classification (ZSAC) by enabling classification with any arbitrary set of categories specified with natural language prompts. In this paper, we propose a simple but effective method to improve ZSAC with CLAP. Specifically, we shift from the conventional method of using prompts with abstract category labels (e.g., Sound of an organ) to prompts that describe sounds using their inherent descriptive features in a diverse context (e.g.,The organ's deep and resonant tones filled the cathedral.). To achieve this, we first propose ReCLAP, a CLAP model trained with rewritten audio captions for improved understanding of sounds in the wild. These rewritten captions describe each sound event in the original caption using their unique discriminative characteristics. ReCLAP outperforms all baselines on both multi-modal audio-text retrieval and ZSAC. Next, to improve zero-shot audio classification with ReCLAP, we propose prompt augmentation. In contrast to the traditional method of employing hand-written template prompts, we generate custom prompts for each unique label in the dataset. These custom prompts first describe the sound event in the label and then employ them in diverse scenes. Our proposed method improves ReCLAP's performance on ZSAC by 1%-18% and outperforms all baselines by 1% - 55%.

Summary

AI-Generated Summary

PDF132November 16, 2024