SAKE:迈向大型音频语言模型的听觉属性知识编辑
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
October 19, 2025
作者: Chih-Kai Yang, Yen-Ting Piao, Tzu-Wen Hsu, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee
cs.AI
摘要
知识编辑为更新模型知识提供了一种无需完整重训练的高效途径,但先前研究几乎完全集中于文本或视觉模态。我们提出首个专门针对大型音频语言模型中听觉属性知识编辑的基准SAKE。与事实性更新不同,SAKE聚焦于若干抽象听觉属性,捕捉超越传统文本与视觉领域的知识类型。我们在两个大型音频语言模型上对七种编辑方法进行四维基准测试:可靠性、泛化性、音频/文本局部性及可移植性。结果揭示了诸多挑战,包括保护与编辑无关的属性内知识、将编辑泛化至多模态推理,以及在序列更新下保持编辑效果。SAKE建立了系统化研究框架,探索知识编辑如何扩展至听觉模态,为在更多样化现实场景中维护和适配大型音频语言模型开辟了新方向。
English
Knowledge editing offers an efficient way to update model knowledge without
full retraining, but prior work has concentrated almost exclusively on textual
or visual modalities. We introduce SAKE, the first benchmark specifically
designed for editing auditory attribute knowledge in Large Audio-Language
Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory
attributes, capturing knowledge types that go beyond conventional textual and
visual domains. We benchmark seven editing methods on two LALMs along four
dimensions: reliability, generality, audio/text locality, and portability.
Results highlight challenges such as preserving intra-attribute knowledge
unrelated to the edit, generalizing edits to multimodal reasoning, and
maintaining edits under sequential updates. SAKE provides a principled
framework to study how knowledge editing extends to the auditory modalities,
opening new directions for maintaining and adapting LALMs in more diverse
real-world scenarios.