ChatPaper.aiChatPaper

AAD-LLM:基於神經注意力機制的聽覺場景理解

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

February 24, 2025
作者: Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Adeen Flinker, Daniel Friedman, Nima Mesgarani
cs.AI

摘要

聽覺基礎模型,包括聽覺大語言模型(LLMs),對所有聲音輸入進行均等處理,與聽者的感知無關。然而,人類的聽覺感知本質上是選擇性的:在複雜的聽覺場景中,聽者會專注於特定說話者而忽略其他聲音。現有模型並未融入這種選擇性,限制了其生成與感知一致的回應能力。為此,我們提出了意圖感知的聽覺場景理解(II-ASU),並展示了聽覺注意力驅動的LLM(AAD-LLM),這是一個整合腦信號以推斷聽者注意力的原型系統。AAD-LLM通過結合顱內腦電圖(iEEG)記錄來擴展聽覺LLM,解碼聽者正在關注的說話者並據此精煉回應。該模型首先從神經活動預測被關注的說話者,然後根據這一推斷的注意力狀態來條件化回應生成。我們在多說話者場景中評估了AAD-LLM在說話者描述、語音轉錄與提取以及問答任務上的表現,客觀和主觀評分均顯示其與聽者意圖的對齊度有所提升。通過向意圖感知的聽覺AI邁出第一步,這項工作探索了一種新的範式,即聽者感知指導機器聽覺,為未來以聽者為中心的聽覺系統鋪平了道路。演示與代碼可訪問:https://aad-llm.github.io。
English
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

Summary

AI-Generated Summary

PDF53February 26, 2025