AudioBERT: オーディオ知識拡張言語モデル

要旨

最近の研究では、テキストのみで事前学習された言語モデルが、日常の物体の色などの基本的な視覚知識を欠いていることが特定されています。この観察に触発され、同様の欠点が聴覚知識にも存在するかどうかを問いたいと考えています。この問いに答えるために、AuditoryBenchという新しいデータセットを構築しました。このデータセットには、聴覚知識を評価するための2つの新しいタスクが含まれています。ベンチマークを使用した分析に基づき、言語モデルも重大な聴覚知識の不足に苦しんでいることがわかりました。この制限に対処するために、私たちはAudioBERTという新しい手法を提案しています。これは、再現ベースのアプローチを用いてBERTの聴覚知識を拡充するものです。まず、プロンプト内の聴覚知識スパンを検出し、効率的にクエリするためのリトリーバルモデルを適用します。次に、BERTに音声知識を注入し、音声知識が必要な場合に効果的な適応を行うために低ランク適応を切り替えます。実験では、AudioBERTが非常に効果的であり、AuditoryBenchで優れたパフォーマンスを達成していることが示されています。データセットとコードは、https://github.com/HJ-Ok/AudioBERT で入手可能です。

English

Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, e.g., colors of everyday objects. Motivated by this observation, we ask whether a similar shortcoming exists in terms of the auditory knowledge. To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge. Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge. To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach. First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently. Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation when audio knowledge is required. Our experiments demonstrate that AudioBERT is quite effective, achieving superior performance on the AuditoryBench. The dataset and code are available at https://github.com/HJ-Ok/AudioBERT.

AudioBERT: オーディオ知識拡張言語モデル

AudioBERT: Audio Knowledge Augmented Language Model

要旨

Support