AudioBERT：音频知识增强语言模型

摘要

最近的研究发现，仅在文本数据集上预训练的语言模型通常缺乏基本的视觉知识，例如日常物体的颜色。受到这一观察的启发，我们探讨类似的问题是否存在于听觉知识方面。为了回答这个问题，我们构建了一个名为AuditoryBench的新数据集，其中包含两个用于评估听觉知识的新任务。通过使用基准测试进行分析，我们发现语言模型也存在严重的听觉知识缺乏。为了解决这一局限性，我们提出了一种名为AudioBERT的新方法，通过基于检索的方法增强BERT的听觉知识。首先，我们在提示中检测听觉知识跨度，以便高效地查询我们的检索模型。然后，我们将音频知识注入BERT，并在需要音频知识时开启低秩适应。我们的实验表明，AudioBERT非常有效，在AuditoryBench上取得了优越的性能。数据集和代码可在https://github.com/HJ-Ok/AudioBERT找到。

English

Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, e.g., colors of everyday objects. Motivated by this observation, we ask whether a similar shortcoming exists in terms of the auditory knowledge. To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge. Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge. To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach. First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently. Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation when audio knowledge is required. Our experiments demonstrate that AudioBERT is quite effective, achieving superior performance on the AuditoryBench. The dataset and code are available at https://github.com/HJ-Ok/AudioBERT.

AudioBERT：音频知识增强语言模型

AudioBERT: Audio Knowledge Augmented Language Model

摘要

Support