通過潛在擴散模型從腦電圖數據解碼自然音樂
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
May 15, 2024
作者: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo Akama
cs.AI
摘要
本文探討了潛在擴散模型的潛力,這是一類功能強大的生成模型,用於從腦電圖(EEG)記錄中重建自然音樂的任務。與簡單的音樂(如MIDI生成的曲調或單音樂曲)不同,這裡的重點是複雜的音樂,具有多樣的樂器、聲音和效果,豐富的諧波和音色。本研究代表了初步嘗試,旨在使用非侵入性EEG數據實現高質量的通用音樂重建,採用端到端的訓練方法,直接在原始數據上進行,無需手動預處理和通道選擇。我們在公開的NMED-T數據集上訓練我們的模型,並提出基於神經嵌入的量化評估指標。此外,我們基於生成的曲目進行歌曲分類。我們的工作有助於神經解碼和腦-電腦界面的持續研究,提供了使用EEG數據進行複雜聽覺信息重建的可行性洞察。
English
In this article, we explore the potential of using latent diffusion models, a
family of powerful generative models, for the task of reconstructing
naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler
music with limited timbres, such as MIDI-generated tunes or monophonic pieces,
the focus here is on intricate music featuring a diverse array of instruments,
voices, and effects, rich in harmonics and timbre. This study represents an
initial foray into achieving general music reconstruction of high-quality using
non-invasive EEG data, employing an end-to-end training approach directly on
raw data without the need for manual pre-processing and channel selection. We
train our models on the public NMED-T dataset and perform quantitative
evaluation proposing neural embedding-based metrics. We additionally perform
song classification based on the generated tracks. Our work contributes to the
ongoing research in neural decoding and brain-computer interfaces, offering
insights into the feasibility of using EEG data for complex auditory
information reconstruction.Summary
AI-Generated Summary