ChatPaper.aiChatPaper

SAKE:邁向大型音訊語言模型聽覺屬性知識的編輯之路

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

October 19, 2025
作者: Chih-Kai Yang, Yen-Ting Piao, Tzu-Wen Hsu, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee
cs.AI

摘要

知識編輯提供無需完整重新訓練即可更新模型知識的有效方法,但先前研究幾乎完全集中於文本或視覺模態。我們提出首個專為大型音頻語言模型設計的聽覺屬性知識編輯基準SAKE。與事實性更新不同,SAKE針對多種抽象聽覺屬性,涵蓋超越傳統文本與視覺領域的知識類型。我們在兩個LALM上從四個維度(可靠性、泛化性、音頻/文本局部性及可移植性)對七種編輯方法進行基準測試。結果揭示了諸多挑戰:如何保留與編輯無關的屬性內知識、將編輯泛化至多模態推理,以及在序列更新下維持編輯效果。SAKE建立了系統性框架來研究知識編輯如何延伸至聽覺模態,為在更多元現實場景中維護和適配LALM開闢了新方向。
English
Knowledge editing offers an efficient way to update model knowledge without full retraining, but prior work has concentrated almost exclusively on textual or visual modalities. We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory attributes, capturing knowledge types that go beyond conventional textual and visual domains. We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability. Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates. SAKE provides a principled framework to study how knowledge editing extends to the auditory modalities, opening new directions for maintaining and adapting LALMs in more diverse real-world scenarios.
PDF192December 2, 2025