ReCLAP:通过描述声音来改善零样本音频分类
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
September 13, 2024
作者: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
cs.AI
摘要
开放词汇的音频语言模型,如CLAP,通过使用自然语言提示指定的任意类别集合,为零样本音频分类(ZSAC)提供了一种有前途的方法。在本文中,我们提出了一种简单而有效的方法来改进使用CLAP的ZSAC。具体而言,我们从使用带有抽象类别标签的提示的传统方法(例如,风琴的声音)转变为使用描述声音的内在描述性特征在多样化环境中的提示(例如,风琴深沉而共鸣的音调充满了大教堂)。为了实现这一点,我们首先提出了ReCLAP,这是一个使用重写的音频字幕训练的CLAP模型,以改进对野外声音的理解。这些重写的字幕描述了原始字幕中的每个声音事件,使用它们独特的区分特征。ReCLAP在多模态音频文本检索和ZSAC上表现优于所有基线。接下来,为了改进使用ReCLAP的零样本音频分类,我们提出了提示增强。与传统的使用手写模板提示的方法相反,我们为数据集中的每个唯一标签生成自定义提示。这些自定义提示首先描述标签中的声音事件,然后在不同场景中使用它们。我们提出的方法将ZSAC上ReCLAP的性能提高了1%-18%,并且在所有基线上的表现提高了1%-55%。
English
Open-vocabulary audio-language models, like CLAP, offer a promising approach
for zero-shot audio classification (ZSAC) by enabling classification with any
arbitrary set of categories specified with natural language prompts. In this
paper, we propose a simple but effective method to improve ZSAC with CLAP.
Specifically, we shift from the conventional method of using prompts with
abstract category labels (e.g., Sound of an organ) to prompts that describe
sounds using their inherent descriptive features in a diverse context (e.g.,The
organ's deep and resonant tones filled the cathedral.). To achieve this, we
first propose ReCLAP, a CLAP model trained with rewritten audio captions for
improved understanding of sounds in the wild. These rewritten captions describe
each sound event in the original caption using their unique discriminative
characteristics. ReCLAP outperforms all baselines on both multi-modal
audio-text retrieval and ZSAC. Next, to improve zero-shot audio classification
with ReCLAP, we propose prompt augmentation. In contrast to the traditional
method of employing hand-written template prompts, we generate custom prompts
for each unique label in the dataset. These custom prompts first describe the
sound event in the label and then employ them in diverse scenes. Our proposed
method improves ReCLAP's performance on ZSAC by 1%-18% and outperforms all
baselines by 1% - 55%.Summary
AI-Generated Summary