AAD-LLM: 신경 어텐션 기반 청각 장면 이해

초록

청각 기반 모델, 특히 청각 대형 언어 모델(LLMs)은 청자의 인지와 무관하게 모든 소리 입력을 동등하게 처리합니다. 그러나 인간의 청각 인지는 본질적으로 선택적입니다: 청자들은 복잡한 청각 환경에서 특정 화자에 집중하며 다른 화자들은 무시합니다. 기존 모델들은 이러한 선택성을 반영하지 않아, 청자 인지와 일치하는 응답을 생성하는 데 한계가 있습니다. 이를 해결하기 위해, 우리는 의도 기반 청각 장면 이해(Intention-Informed Auditory Scene Understanding, II-ASU)를 소개하고, 청자의 주의를 추론하기 위해 뇌 신호를 통합한 프로토타입 시스템인 청각 주의 주도 LLM(Auditory Attention-Driven LLM, AAD-LLM)을 제시합니다. AAD-LLM은 청각 LLM을 확장하여 두개 내 뇌파 기록(intracranial electroencephalography, iEEG)을 통합하여 청자가 주의를 기울이는 화자를 해독하고, 이를 바탕으로 응답을 개선합니다. 이 모델은 먼저 신경 활동에서 주의를 기울이는 화자를 예측한 후, 이 추론된 주의 상태를 기반으로 응답 생성을 조정합니다. 우리는 AAD-LLM을 다중 화자 시나리오에서 화자 설명, 음성 전사 및 추출, 질문 응답 작업에 대해 평가했으며, 객관적 및 주관적 평가 모두에서 청자 의도와의 일치도가 향상되었음을 확인했습니다. 이 연구는 의도 인지형 청각 AI로의 첫 걸음을 내딛음으로써, 청자 인지가 기계 청각에 정보를 제공하는 새로운 패러다임을 탐구하며, 미래의 청자 중심 청각 시스템을 위한 길을 열었습니다. 데모 및 코드는 https://aad-llm.github.io에서 확인할 수 있습니다.

English

Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

AAD-LLM: 신경 어텐션 기반 청각 장면 이해

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

초록

Support